Automatic Differentiation to Simultaneously Identify Nonlinear Dynamics and Extract Noise Probability Distributions from Data

Automatic Differentiation to Simultaneously Identify Nonlinear Dynamics and Extract Noise Probability Distributions from Data

Abstract

The sparse identification of nonlinear dynamics (SINDy) is a regression framework for the discovery of parsimonious dynamic models and governing equations from time-series data. As with all system identification methods, noisy measurements compromise the accuracy and robustness of the model discovery procedure. In this work we develop a variant of the SINDy algorithm that integrates automatic differentiation and recent time-stepping constrained motivated by Rudy et al. [66] for simultaneously (i) denoising the data, (ii) learning and parametrizing the noise probability distribution, and (iii) identifying the underlying parsimonious dynamical system responsible for generating the time-series data. Thus within an integrated optimization framework, noise can be separated from signal, resulting in an architecture that is approximately twice as robust to noise as state-of-the-art methods, handling as much as 40% noise on a given time-series signal and explicitly parametrizing the noise probability distribution. We demonstrate this approach on several numerical examples, from Lotka-Volterra models to the spatio-temporal Lorenz 96 model. Further, we show the method can identify a diversity of probability distributions including Gaussian, uniform, Gamma, and Rayleigh.

1

1 Introduction

The data-driven discovery of governing equations is an emerging field within the machine learning and artificial intelligence communities. From neural networks (NN) to traditional model regression techniques, a diversity of methods are emerging that transform time-series data (or spatio-temporal data) into representations of governing equations of motion [10, 52, 39, 71, 36, 13, 50, 82, 34, 2, 5, 85, 81, 47, 80, 55, 43, 58, 14, 61, 59, 60, 26, 84, 6, 72, 20, 69, 11, 65, 66, 86]. The interpretability and generalizability of these discovered equations of motion are critical for understanding, designing, and controlling complex systems. As such, the sparse identification of nonlinear dynamics (SINDy) [11] framework provides a compelling regression framework since discovered models are interpretable and parsimonious by design. As with all system identification algorithms, noisy measurements compromise the accuracy and robustness of the model discovery procedure. Moreover, many optimization frameworks rely explicitly on the assumption of Gaussian noise, which is rarely true in the real world. Recently, Rudy et al. [66] developed a novel optimization framework for separating signal and noise from noisy time-series data by identifying a deep NN model for the signal from numerical time-stepping constraints such as a Runge-Kutta. In this work, we build on this framework and leverage automatic differentiation  [3] in the optimization procedure to simultaneously denoise data and identify sparse nonlinear models via SINDy. This new architecture yields significant improvements in model discovery, including superior separation of the signal from noise while simultaneously characterizing the noise distribution.

SINDy has emerged as a flexible and promising architecture for model discovery due to its inherent parsimonious representation of dynamics. The SINDy framework relies on sparse regression on a library of candidate model terms to select the fewest terms required to describe the observed dynamics [11]. Specifically, SINDy is formulated as an over-determined linear system of equations , where sparsity of the solution is promoted by the -norm . Thus sparsity is a proxy for parsimony, interpretability, and generalizability. Measurement noise, however, is always present, and it corrupts the ability of the SINDy regression framework, and indeed any other model discovery paradigm, to accurately extract governing models.

There are many variants of sparse regression, all of which typically attempt to approximate the solution to an NP-hard, -norm penalized regression. Sparsity-promoting methods like the LASSO [74, 76] use the -norm as a proxy for sparsity since tractable computations can be performed. The iterative least-squares thresholding technique of the SINDy algorithm promotes sparsity through a sequential procedure. Recently, Zhang and Schaeffer [87] have provided several rigorous theoretical guarantees on the convergence of the SINDy algorithm. Specifically, they proved that the algorithm approximates local minimizers of an unconstrained -penalized least-squares problem, which allows them to provide sufficient conditions for general convergence, the rate of convergence, and conditions for one-step recovery. Using a relaxed formulation, Champion et al. [16] show how the SINDy regression framework can accommodate additional structure, robustness to outliers, and nonlinear parameter estimation using the sparse relaxed regularized regression (SR3) formulation [90]. SINDy results in interpretable models, and it has been widely applied in many scientific disciplines [73, 41, 19, 40, 28, 42, 24, 51, 21, 30, 75, 37, 23, 70, 53, 4]. Moreover, it has been extended to incorporate control [12, 32], rational or implicit dynamics [46, 31, 88], partial differential equations [65, 48], parametric model dependencies [63], discrepancy models [30, 21], multiscale physics [15], stochastic dynamics [7], constrained physics [40], among many other innovations [41, 69, 44, 77, 67, 68, 83, 7, 45, 25, 16, 27, 62, 49]. The PySINDy Python package executes many of these variants [22].

Despite its flexibility, modularity, and extensibility, SINDy and its variants typically rely on approximating time-derivative of the measured time-series data. Computing derivatives of noisy measurement data is known to be a challenging problem, with many algorithmic innovations and mathematical architectures developed to produce accurate derivative approximations [78]. These methods include finite-differences, spectral methods [35], spline smoothing, filtering procedures, polynomial fitting, low-rank projection, and total variations, to highlight some of the diverse techniques employed for this critical task of scientific computing. This task is made even more difficult, depending upon the noise statistics. Gaussian noise is often easier to learn and characterize than noise distributions that have non-zero means and are not symmetric. Ultimately, there is a need for methods that are robust to noisy measurements and diverse probability distributions.

Recent innovations in automatic differentiation have enabled the solution of an optimization problem directly related to the computation of the required derivatives [66]. Since its inception, automatic differentiation has been widely used in the machine learning community to enable complicated optimization problems without manually computing Jacobians [1, 3, 57, 79, 17, 64, 66, 9, 56, 38]. More recently, this approach has been used with NNs to separate a signal from noise and model the signal when a model is unknown [66], and to improve Kalman smoothing when the governing equations are known [64]. The success of these algorithms suggest that they could be leveraged for noise signal separation in the SINDy framework. In this work, we extend this simultaneous de-noising and discovery approach to SINDy. Specifically, automatic differentiation enables differentiation with respect to the functions in the SINDy library, thus circumventing a direct differentiation of the noisy time-series data. The modified SINDy algorithm is more robust to noise and further allows for an explicit characterization (discovery) of the underlying probability distribution of the noise, something that current state-of-the-art methods cannot do and is a unique feature of our method.

In Sec. 2, we illustrate the modified SINDy algorithm. In Sec. 3, we show the comparison between modified SINDy and noise signal separation approach based on the NN proposed by Rudy et al. [66]. In Sec. 4, we show the use of modified SINDy on various numerical examples. We also show how modified SINDy can be used to extract the noise distribution information and how it can be used in the discrepancy modeling framework. In Sec. 5, we show our conclusions and possible future improvements.

2 Methods

In what follows, we introduce the basic mathematical architecture behind the SINDy algorithm, demonstrating explicitly its sensitivity to noisy measurements. This guides our introduction of the modified SINDy for simultaneously learning the system model and denoising the signal.

2.1 Sparse Identification of Nonlinear Dynamics

The SINDy algorithm [11] provides a principled, data-driven discovery method for nonlinear dynamics of the form

(1)

where is system states represented as a row vector. SINDy posits a set of candidate functions that would characterize the right hand side of the governing equations. Candidate model terms form the library of potential right hand side terms, where is formed by row vectors. This then allows for the formulation of a regression problem to select only the few candidate terms necessary to describe the dynamics:

(2)

where the matrix is comprised of the sparse vectors that select candidate model terms. The amount of sparsity promotion is controlled by the parameter , which determines the penalization by the -norm. The can be any candidate function that may describe the system dynamics such as trigonometric functions or polynomial functions , for example. By solving Eq. (2), we can identify a model of system dynamics

(3)

Many different optimization techniques can be used to obtain the sparse coefficients , such as sequentially thresholded least squares (STLSQ) [11, 87], LASSO [76], sparse relaxed regularized regression (SR3) [90, 16], stepwise sparse regression (SSR) [8], and Bayesian approaches [88, 54].

In practice, noise-free measurements of are not available, and only the full state noisy measurement

(4)

is provided to SINDy from sensors, where is noisy measurement and is the noise added to true state. Thus, Eq. (2) then becomes

(5)

where is noisy measurement matrix formed by row vectors measurement of size and is noise matrix also formed by row vector of size . From Eq. (5), note that the solution is no longer the same shown in Eq. (2) due to the presence of noise. Moreover, the noise will be magnified when approximating the derivatives by a factor of  [65], and it will non-linearly corrupt the library matrix . Extensive research has been done to improve the robustness of the SINDy framework. The integral formulation [67] and weak formulation [62, 49, 48] improved the regression robustness by avoiding taking derivative of noisy data. Other approaches, such as subsampling [89], increased the noise robustness of the SINDy framework by doing regression on the subsampled measurement that has less noise. Corrupt data can also be handled with methods from robust statistics [77, 16]. In the next section, we introduce an alternative approach that simultaneously learns the noise while using the denoised data to perform model identification.

2.2 Simultaneously Denoising and Learning System Model

Figure 1: This figure illustrates modified SINDy algorithm. The goal is to learn the system model and noise . The noise is subtracted from the measurement to obtain the clean data. To achieve this, the estimated noise is set as an optimization parameter and the cost function is minimized. This optimization is performed several times, and the small values of is enforced to be zero for the remainder of the optimization process.

To improve the noise robustness of the SINDy regression, we determine the estimated noise as a hyper-parameter and formulate in order to optimize the difference between the estimated derivative and system’s vector field such that

(6)

where is formed by estimated true states , and is the derivative approximation error. Note that . When , the effect of noise can be eliminated, and the accuracy of SINDy will be significantly improved. However, there exist many trivial solutions for minimizing the Eq. (6) with two uncorrelated optimization parameters and . Thus, an additional constraint is needed to regularize Eq. (6).

The additional constraint proposed here uses the estimated vector field of the system model similar to the one proposed by Rudy et al. [66]. Equation (3) gives the estimate of the true vector field . Integrating over a segment of time to gives the integrated vector field, or flow map,

(7)

This can be generalized to integrate the system either forward or backward in time steps. This gives

(8)

To obtain the in Eq. (8), a numerical simulation scheme such as Runge–Kutta can be used. In what follows, we employ a 4th-order Runge-Kutta method to simulate the dynamics forward/backward in time -steps. Similar to Eq. (8), when the noisy measurement data is given, the estimated state satisfies

(9)

when and the exact value of is known. Thus, by minimizing

(10)

the optimization parameters and are coupled, resulting in additional structural constraint of the model. The parameter is used to account for the numerical error and is set to , where is a constant (throughout this paper, we use ). The use of suggests that the simulation error too far ahead in the future, or too far backward in the past, should be penalized less due to the error of the numerical simulation scheme. The error incurred by simulating the vector filed forward/backward on the entire trajectory can be written as

(11)

Using subscripts to represent the time step, the final cost function is then

(12)

which is the summation of the derivative approximation error and simulation error . The optimization problem to simultaneously denoise and learn the system model can then be written as

(13)

The global optimal solution for Eq. (13) needs to satisfy and . To solve for Eq. (13), it is necessary to calculate the Jacobian and , which is a difficult task to do analytically or computationally. However, recent automatic differentiation packages such as Tensorflow [1] and Julia Flux [29] make it possible to directly extract the gradients of with respect to and . This allows us to solve the optimization problem in Eq. (13) easily using gradient descent method such as Adam [33]. Throughout this paper, we use the Tensorflow 2.0 and Adam optimizer to solve the Eq. (13). Moreover, to enforce the sparsity of the identified model, a thresholding approach [11] is used and the Eq. (13) is solved for times. Each iteration uses the previous iteration’s optimization result as the initial guess of the new iteration. The values of is also used to calculate the estimated state , which is used to calculate the new estimated values of the selection parameter . Furthermore, if the elements in are smaller than a threshold at the end of an optimization loop, those elements will be constrained to zero for the remainder of the optimization process. Figure 1 illustrates this process, and Appendix. A shows the detailed algorithm for simultaneous denoising and sparse model identification. Some guidance on the selection of the hyper-parameters , , and is given in Appendices. BC, and D.

3 Performance Comparison with Neural Network Denoising Approach

The advocated optimization framework of modified SINDy is compared with a NN denosing approach by Rudy et al. [66]. Additionally, the robustness to noise and the amount of data is considered.

3.1 Performance Criteria

For ease of comparison, we use the same performance criteria developed by Rudy et al. [66]. Specifically, these are the vector field error , the noise identification error , and the prediction error . The vector filed error is

(14)

which calculates the relative squared error between the true vector filed and identified vector field . The noise identification error is

(15)

which is the mean difference between the true noise and identified noise . The prediction error is

(16)

and it calculates the difference between forward simulation trajectory and true trajectory. For comparison of modified SINDy and recently published Weak-SINDy [49], as shown in Appendix. F, two more performance criteria are used. The first one is the normalized parameter error

(17)

which reflects how much the identified parameters is off from the true parameters . The other one is the success rate, which describes the percentage of identifying the model’s correct structure in multiple trials.

3.2 Robustness to Noise

The Lorenz attractor is used as an example to test the noise robustness of the the approach. The model of the chaotic Lorenz is

(18)

where , , and . The Lorenz attractor is simulated with initial condition , , and . The prediction step is chosen as for both approaches compared and for our proposed method. Unless otherwise noted, Adam optimizer is used to optimize the problem with maximum iteration set to for modified SINDy and for NN approach [66]. Different magnitudes of Gaussian noise are added to generate the noisy training data. The noise level is defined as

(19)
Figure 2: Top: Noise identification error of modified SINDy (labeled as SINDy) and NN denoising approach by Rudy et al. [66]. The black circle represents the median of ten runs while the violin shape represents the distribution of error. The modified SINDy approach shows better noise identification error. Bottom: Comparison between the average noise applied to the Lorenz system and the noise identified by the two approaches. As shown on the left, both approaches can not produce the correct zero noise result when no noise is applied, which happens since there is a tiny difference between the learned dynamics and true dynamics.

For each noise level, different sets of noisy data are generated and used as data for both approaches. The NN approach [66] uses the same set up as [66], with hidden layers, and each layer having neurons. Moreover, the regularization parameter is chosen as , and the penalty for is chosen as . Unless otherwise noted, we use the same set up for all the NNs in this paper. For modified SINDy, the library is constructed with terms up to second order (not including the constant term). Moreover, the value of the sparsity parameter varies based on the noise added. For most of the case, . A Tikhonov regularization approach is used to pre-smooth the noisy data as in [66], although we have found that pre-smoothing does not affect the results appreciably when using zero-mean noise.

Fig. 2 show the noise identification error of the NN approach [66] and the modified SINDy approach. The vector field error and short term prediction error can be seen in Fig. 3. For all the noise levels, modified SINDy correctly identified the Lorenz model. To calculate the prediction error, the identified model is simulated seconds forward in time, with , for both modified SINDy and NN denoising approach [66]. Fig. 3 suggests that modified SINDy identified model has better performance when simulated forward in time. Appendix. E shows noise robustness comparison between the modified SINDy and original SINDy [11]. In general, the modified SINDy is about times more robust than orignal SINDy [11]. A comparison between modified SINDy and the recently developed Weak-SINDy approach [49] is presented in Appendix. F.

Figure 3: Top: The vector field error and prediction error of modified SINDy (labeled as SINDy) and NN denoising approach by Rudy et al. [66] is shown. The black dot is the median of the runs, and the violin shape represents the distribution of the error. Bottom: The simulated trajectory is shown with the initial condition chosen as .

3.3 Robustness to Data Length

Figure 4: (a), (b): As the training data length increases, the vector field error and noise identification error of modified SINDy (labeled as SINDy) and NN denoising approach by Rudy et al. [66] decreases. (c) The modified SINDy can use seconds of data to identify the system model with accuracy. There is a tiny drop in the success rate when the training data length is seconds due to the choice of a fixed thresholding parameter. By using a larger thresholding parameter, the success rate can be back to . Bottom: A comparison of averaged true noise and identified noise by two approaches is shown.

We also compare the performance of the NN denoising approach by Rudy et al. [66] and modified SINDy under different data usage with a fixed noise level. The minimum amount of data needed by modified SINDy to correctly identify the system model is shown by using Lorenz attractor as an example. To perform the numerical experiment, the same initial point, , is used to generate noise-free data of different temporal lengths. The time step is fixed at with of Gaussian noise added to generate noisy training data. The success rate of modified SINDy is calculated to indicate the minimum amount of data needed to identify the correct system model. The prediction error is not shown since the simulation of the identified model in the low data limit is not stable. With a learning rate of , Adam is used to optimize the problem with the prediction step set to for both approaches. A fixed thresholding parameter with is used for modified SINDy and the library is constructed with up to second order terms (without constant term added). Fig. 4 suggests that when the correct parameters and library is used for modified SINDy, it will out-perform the NN denoising approach by Rudy et al. given the same amount of data.

4 Results

In this section, we demonstrate the ability of modified SINDy to separate signal and noise while learning the system model. The Van der Pol oscillator will be used as the example test case to show that modified SINDy can identify the correct distribution of the Gaussian noise added to the system. Additionally, we highlight several other examples tested with modified SINDy and summarize the performance. Furthermore, as a more advanced example, we show that modified SINDy can be used to separate non-Gaussian, non-zero mean, and non-symmetric noise distributions from the dynamics. Finally, we show how modified SINDy can be integrated to the discrepancy modeling approach [30].

4.1 Van der Pol Oscillator

Figure 5: Distribution of the noise learned by modified SINDy is shown.

The Van der Pol oscillator is used as our test case to demonstrate the ability of modified SINDy to denoise and learn the system dynamics simultaneously. The Van der Pol oscillator is given by

(20)

where the nonlinear damping/gain parameter is used for demonstration purposes. The system is simulated with initial condition , , and . The Adam optimizer with learning rate of is used for all noise levels. The parameters of modified SINDy are chosen as and , and the library of candidate functions is constructed with polynomial terms up to third order (without constant term). Three different levels of noise are applied and the distribution of identified noise is shown in Fig. 5. Figure 5 shows that modified SINDy correctly identified the distribution of true noise.

4.2 Rössler Attractor

Figure 6: The modified SINDy is used to denoise the measurement of Rössler attractor while learning its model. The identified model shows high accuracy when simulating forward.

The second example we use is the Rössler attractor that is governed by

(21)

where , , and . The system is simulated with initial condition , , and . The Adam optimizer with learning rate of is used for all noise levels. The parameters of modified SINDy are chosen as and , and the library of candidate functions is constructed with polynomial terms up to second order (with constant term). Three different levels of noise are applied and the denoised signal is shown in Fig. 6. Figure 6 also shows the simulated trajectories of the identified models. The initial condition , , and are used to simulate the identified models.

4.3 Lorenz 96 Model

Figure 7: The modified SINDy is used to denoise the measurement of Lorenz 96 system while learning its model.

As our last example, we use the modified SINDy to identify Lorenz 96 model whose equation is given by

(22)

for . We assume , , , and set forcing term as to generate chaotic behavior. The number is set as such that the model has states. The system is simulated with initial condition , , and . The Adam optimizer with learning rate of is used for all noise levels. The parameters of modified SINDy are chosen as and  (for noise, ). The library of candidate functions is constructed with polynomial terms up to third order (with constant term included, candidates in total). Three different levels of noise are applied and the denoised signal is shown in Fig. 7 (for ease of visualization, only the first three states are shown). Figure 7 also shows the simulated trajectories of identified models. The initial condition , , and are used to simulate the identified models.

Figure 8: This plot shows the denoising ability of modified SINDy with different examples. Regardless of the noise percentage, modified SINDy correctly identified all the system models.

In Fig. 8, the effectiveness of modified SINDy is demonstrated on a number of canonical dynamical systems models. For all examples, Gaussian noise with zero-mean is added to generate the noisy training data, and Adam optimizer is used to perform the optimization. The models and other parameters used for each example are summarized in Appendix. F. The modified SINDy correctly identified all the system model and noise distribution regardless of the noise magnitude used.

4.4 Identification of Noise Distributions

The modified SINDy algorithm has the ability to to handle different kinds of noise distributions. Three different kinds of noise distributions are used to demonstrate this: Gaussian, Uniform, and Gamma. To generate the Gamma noise, its shape and scale are set to . The generated noise is multiplied by . The noise-free data of Van der Pol oscillator is generated the same way in Sec. 4.1. The prediction step is set to and the sparsity parameter is set to . Figure 9 shows the distribution identified by modified SINDy. Figure 9 shows that learning the non-zero mean noise distribution is more difficult than learning a zero-mean one. For better learning results of a non-zero mean noise distribution, one can try the iterative learning approach shown in Appendix. H. Once the noise is separated from the signal, an additional step can be taken to identify the distribution of noise from the candidate distributions. This can be achieved by the fitter package in Python [18]. Appendix. I shows more details of this process.

Figure 9: The ability of modified SINDy to handle different kinds of noise distribution is illustrated in this figure, and the Van der Pol oscillator is used as an example. of noise is generated and added to the clean signal. As this figure shows, modified SINDy can identify different types of noise distribution correctly.

4.5 Discrepancy Modeling

Modified SINDy can be easily integrated with the discrepancy modeling framework of SINDy  [30]. This is of great practical value since it is often the case that parts of the dynamics is known. Suppose the known (theoretical) right-hand side dynamics in Eq. (1) is . Discrepancy modeling assumes that the known model is not capable of modeling the data due to missing physics terms on the right-hand side. Thus there is a mismatch between the derivative and the known dynamics . The discrepancy modeling approach tries to identify the missing dynamics such that

(23)

To illustrate this process, consider a system , whose model is given as

(24)

Eq. (24) is simulated with the , , and to generate noise-free data. Training data is produced by adding Gaussian noise in order to create the noisy measurement. Assume that the noisy measurement of Eq. (24) is given. Further assume that the dynamics is modified based on , which is given by

(25)

The difference between the Eq. (24) and Eq. (25) will be the discrepancy model that modified SINDy identifies. Note that this prior information of the dynamics, , can be constrained to exist in the modified SINDy library during the optimization process, and its parameters can be used as an initial guess of the true parameters. Thus, the only thing we have to learn is the missing dynamics. Figure 10 illustrates this process. In this example, the , , and the learning rate of Adam optimizer is . Fig.10 suggests that modified SINDy can be used to learn the discrepancy model when parts of the dynamics are already known.

Figure 10: The modified SINDy can be easily integrated with the discrepancy learning method. The known structure can be utilized while we aim to learn the discrepancy and denoising the measurement.

5 Conclusion and Future Work

In this work, we introduce a new learning algorithm that leverages automatic differentiation and sparse regression for simultaneously (i) denoising time-series data, (ii) learning and parametrizing the noise probability distribution, and (iii) identifying the underlying parsimonious dynamical system responsible for generating the time-series data. The method provides a critically enabling modification to the SINDy algorithm for improving robustness to noise with less training data in comparison with the previously developed NN denoising approach by Rudy et al. [66]. Multiple numerical examples are shown to demonstrate the effectiveness of the modified SINDy method for signal and noise separation as well as model identification. It is also shown that the modified SINDy can be integrated with a discrepancy modeling framework whereby prior information of the dynamical model can be used to help identify the missing dynamics. Importantly, we have shown that modified SINDy can be used to learn various types of noise distributions, including Gaussian, uniform, and non-zero mean noise distributions, such as a Gamma distribution. Overall, the modified SINDy is a robust method with practical potential for handling highly noisy data sets and/or when partial model information is known.

The modified SINDy is modular, allowing for many easily integrated improvements. An important direction for development includes the incorporation of control inputs, since many systems of practical interest are actuated, such as the pendulum on a cart system [12, 31]. Extending modified SINDy to consider the impact of control will significantly expand its application domain. Improvements in computational speed are also desirable. In comparison with the sequential least-square thresholding of the standard SINDy algorithm, the Adam optimizer is slow. There is the potential to use the standard SINDy sparse regression algorithms to warm start the Adam optimization routine. The modified SINDy can also be integrated with SINDy-PI to identify rational or implicit dynamics, which is quite difficult since the simulation error shown in Eq. (11) can not be calculated easily when the dynamics take a rational form. This is the case where the use of the NN denoising approach [66] by Rudy et al. is ideal.

Finally, it is important to improve the robustness of the modified SINDy algorithm when a large number of library terms are used. Currently, the modified SINDy can not handle large libraries robustly due to the non-convexity of the optimization problem. When the library is too large, the problem becomes unstable without decreasing the optimizer’s learning rate. One potential solution is to simulate the dynamics with a variable time step numerical simulation scheme instead of a fixed step scheme, as we used in this paper. Although there are still many improvements to be made, we believe the introduction of modified SINDy will help guide the use of automatic differentiation tools to improve the SINDy framework.

Acknowledgments

SLB acknowledges support from the Army Research Office (ARO W911NF-19-1-0045) and the Air Force Office of Scientific Research (AFOSR FA9550-18-1-0200). JNK acknowledges support from the Air Force Office of Scientific Research (AFOSR FA9550-17-1-0329). We also acknowledge valuable discussions with Samuel H. Rudy, Jared Callaham, Henning Lange, Daniel A. Messenger, and Benjamin Herrmann.

Appendix A Algorithm for Simultaneously Denoising and Learning System Model

Input:
Output: ,
/* Initialize the value of */
if  then
          // If the soft start is true, the estimated value of noise is obtained by pre-smoothing the noisy signal.
       
else
          // Else, the estimated value of noise is initialized using zero matrix.
       
/* Initialize the value of */
. Calculate using . . /* Simultaneously denoising and learning system model */
while  do
        Optimize shown in Eq. (13). // Constrain the elements in whose absolute value smaller than as zero during the rest of optimization.
        // Get new estimate of true state.
        Calculate using // Get new estimate of true derivative.
        // Regress the dynamics on terms in that are not constrained as zero.
       

Appendix B Effect of Thresholding Parameter

Thresholding parameter is the most important parameter to tune in modified SINDy. The parameter will determine the sparsity of the model structure. It’s effect can be seen in Fig. 11. In Fig. 11, Lorenz equation is simulated with , , and . of Gaussian noise is added and Adam optimizer with learning rate of is used to denoise the signal. is set to and different values of is used. For each , the numerical experiments is performed times to calculate the median and distribution of the error as shown in Fig. 11. Fig. 11 suggests that the value of must be properly tuned. If the value of is too small, the sparsity constraint will not be strong enough to enforce the correct model to be found. Moreover, and will easily get stuck in the local minimum. If the value of is too large, the correct terms can be wrongly eliminated and the resulting model structure will be wrong. If the model structure is wrong, there will be huge difference between the identified noise and true noise . To avoid swiping different values of , our proposed method can be easily modified to use the stepwise sparse regression (SSR) approach [7]. However, the use of SSR approach and its performance is not in the scope of this paper.

Figure 11: This figure shows how the choice of sparsity parameter will effect modified SINDy performance. If is too small, modified SINDy will not converge to the correct model in a short range of time and will be stuck in the local minimum. On the contrary, if the sparsity parameter is too large, the identified model will miss the necessary term to build the correct model. Thus, the value of needs to be tuned properly to determine the accurate model.

Appendix C Effect of Prediction Step

Fig. 12 shows the effect of the prediction step on the performance of NN denoising approach by Rudy et al. [66] and modified SINDy approach. The chaotic Lorenz system is used for comparison. The Lorenz attractor is simulated by setting , , and . The noise level is set to to generate noisy data. Each prediction step is run for times to calculate the median of the error. Adam optimizer, with a learning rate of is used to perform the optimization. is set to . Fig. 12 suggests that the performance of modified SINDy is not hugely affected by the prediction step . However, for the NN denoising approach shown in [66], there exist some value of to achieve optimal performance. Fig. 12 also suggests the computational time of both approaches increase linearly as the value of increase. Thus, can be chosen as a small value to save the computational time when using modified SINDy without sacrificing too much of the performance.

Figure 12: Left: The value of noise identification error, prediction error, and vector field error of NN denoising approach by Rudy et al. [66] and modified SINDy (labeled as SINDy) are shown as the value of changes. In (a) to (d), the black circle shows the median of the error of runs while the violin shape represents the distribution of calculated result. Mid: The average of true noise and identified noise of modified SINDy and NN approach is shown for four different prediction steps. Right: The comparison of the simulated and true trajectory of modified SINDy and NN identified model is shown. The simulated trajectory uses and . The model is simulated for seconds. It could be seen that modified SINDy has better performance in this case. All the computation is performed on RTX 2080 GPU, with 32GBs of RAM and AMD Ryzen 7 2700X Processor.

Appendix D Effect of Optimization Iteration

The parameter determines how many times the thresholding optimization is performed. Fig. 13 shows the effect of on the noise identification error and vector field error using Lorenz attractor as an example. The system is simulated by setting , , , and . Adam optimizer, with a learning rate of , is used to optimize the problem. Fig. 13 suggests the performance of modified SINDy will gradually converge in the end.

Figure 13: Top Left: As the number of optimization loop increases, the noise identification error asymptotically decreases and converges to the optimal value. The black circle indicates the median of runs while the violin shape represents error distribution. Top Right: As the number of loop increases, the vector field error gradually converges to a certain value. Bottom: The average of the identified noise and true noise is shown for four different choice of optimization loops. As the number of loop increases, the differences between the true noise and identified noise is minimized.
Figure 14: The maximum level of noise SINDy and modified SINDy can handle to generate the correct model structure is shown. Tikhonov regularization approach is used to pre-smooth the noisy data. It can be seen that the modified SINDy is about times more robust than original SINDy [11].

Appendix E Noise Robustness Comparison with SINDy

This section shows the noise robustness comparison of SINDy [11] and modified SINDy using Van der Pol oscillator, Lorenz attractor, and Rössler attractor. Fig. 14 shows the maximum noise percentage each algorithm can handle to generate the correct model structure. For each noise level, different noisy data sets are generated and provided to both approaches. If the tested algorithm fails to identify the correct model structure for any noisy data sets at a given noise level, we will assume it is not robust to noise at this level. For SINDy, the derivative is computed using finite difference, and we show the effect of pre-smoothing the noisy data on its performance. Note that no smoothing is applied for modified SINDy. The clean data for Lorenz attractor, Van der Pol oscillator, and Rössler attractor is generated the same way shown in Sec. 3.2, Sec. 4.1, and Sec. 4.2. For SINDy, the sparsity parameter is chosen as a hundred uniformly distributed values from to the minimum of true parameters’ absolute value. For modified SINDy, and are used for all examples shown in Fig. 14. Table. 1 shows other parameters we used for modified SINDy. Note that it is possible to make modified SINDy work at a higher noise level by tuning its parameters. However, swiping various parameters is quite computationally heavy for modified SINDy. Thus, the maximum noise level modified SINDy can tolerate in Fig. 14 is a lower approximate.

Model
Noise
Percentage
Library
Order
Random
Seed
Lorenz
Parameter
Error
Max Adam
Iteration
Rössler
Parameter
Error
Max Adam
Iteration
Van der Pol
Parameter
Error
Max Adam
Iteration
Lorenz
Parameter
Error
Max Adam
Iteration
Table 1: Parameters used for modified SINDy in Fig. 14 under maximum noise it can tolerate. The constant term is included when building the library for Rössler attractor and Lorenz 96 model but not for other examples. The parameter error is calculated using Eq. (17).

Appendix F Noise Robustness Comparison with Weak-SINDy

This section shows the noise robustness comparison of Weak-SINDy [49] and modified SINDy using Lorenz attractor as an example. The Lorenz attractor is simulated by setting , , and . For both approaches, the library is constructed using up to second order terms (without constant term). Different percentage of noise is added to the clean data to generate noisy training data. The parameter error and success rate is computed for both approaches. For modified SINDy, Adam optimizer with learning rate of is used to perform the optimization. The sparsity parameter is chosen as for most of the time. If the modified SINDy can not produce the correct result, is used instead. When we pre-smooth the data using approach mentioned in Sec. 3.2 and no pre-smoothing is done when . For Weak-SINDy, when , test functions with polynomial order of are used. The width-at-half-max parameter , and the support size . When , test functions with polynomial order of are used. The , and . different sparsity parameters evenly ranges from to are used, each generates a different candidate model for Weak-SINDy. The final model we used to calculate the parameter error for Weak-SINDy is the model that has correct structure (with only correct terms are selected from the library). If the Weak-SINDy fails to produce the model with correct active terms, the model that predicts the test data best is used to calculate the prediction error, and the test data is generated using initial condition and simulated with and . The final comparison result of the best model generated by Weak-SINDy and modified SINDy can be seen in Fig. 15.

Figure 15: This figure shows noise robustness comparison of modified SINDy and Weak-SINDy using Lorenz attractor as an example. The effect of noise on the parameter error and successful identification rate is compared for both approaches. As it shows in the figure, Weak-SINDy and modified SINDy has almost the same accuracy when . However, when , the modified SINDy has a slightly higher success identification rate.
Models Initial Condition Library Order Learning Rate
Van der Pol
Duffing
Cubic
Lotka-Volterra
Lorenz ,
Table 2: Parameters used in Fig. 8

Appendix G Parameters Used in Fig. 8

In this section, the models used to simulate the system in Fig. 8 are listed. The model used for simulating the Duffing oscillator is

(26)

with , , and . The model used for simulating the Cubic oscillator is

(27)

with , , , and . The model used for simulating the Lotka-Volterra system is

(28)

with and . Other parameters used for training the modified SINDy is summarized in Table. 2. For all examples, and .

Appendix H Tips on Learning Non-Zero Mean Noise

Figure 16: Iterative learning process is shown to tackle the non-zero mean noise distribution. By using this iterative process, modified SINDy can tackle noise distribution with non-zero means. As Sec. 4.4 suggests, learning non-zero mean noise distribution is much harder than learning the zero-mean noise distribution. To achieve better performance on the non-zero mean noise distribution, we propose an iterative learning approach. This approach can be summarized as follow: 1. Apply modified SINDy to the noisy data, briefly learn the distribution of noise. 2. Subtract the mean of learned noise from the noisy measurement, and use the new data to perform the learning. 3. Repeat the step 2 until the result converges and the correct model is found. The Van der Pol oscillator is used to illustrate this approach, and the clean data is generated the same way in Sec. 4.1. of Gamma noise is added to create the noisy data. The parameters are set as , and . Fig. 16 demonstrate this approach. However, there’s no guarantee that this approach will work when the bias of noise is too large, learning the non-zero mean noise is quite hard and careful tuning is needed. We find out using the soft start approach will also help the denoising of non-zero mean noise.

Appendix I Identifying Noise Distribution Type

Table 3: The identified noise distribution versus the true distribution. For Gamma distribution, the parameter represents shape and represents the scale. True Distribution State True Parameter Identified Distribution Identified Parameters Gaussian Gaussian Gaussian Uniform Uniform Uniform Gamma Gamma Gamma Dweibull Dweibull , Dweibull Rayleigh Rayleigh Rayleigh When the noise is identified, it might be interesting to learn what type of distribution the noise follows. To illustrate this, the Van der Pol oscillator shown in Eq. (20) is simulated with initial condition , , and (for Gamma and Rayleigh noise distribution, ). Next, of noise is added to the simulation data to generate the noisy data. The noisy data is provided to modified SINDy to learn the dynamics and identify the noise added to the signal. We set , ( for Gamma noise). Adam optimizer with learning rate equals to is used, and the library order is set to . For all cases, the modified SINDy correctly identified the model. As Table. 3 shows, five different noise distributions is used to generate the noisy data. After the noise is identified, the distribution of noise is fitted into seven candidate noise distributions, which are normal distribution, uniform distribution, Gamma distribution, Dweibull distribution, Rayleigh distribution, Cauchy distribution, and Beta distribution. Next, the sum of the square errors between the and the fitted distribution is calculated, and the distribution that produces the lowest error is selected as the identified noise distribution. Notice that when there’s not enough data provided, it is totally possible that other kinds of distribution is misidentified as the true underlying distribution of noise. The study of how many data points is needed to identify the correct distribution is beyond the scope of this paper. The final result can be summarized in Table. 3.
 

Appendix J Caveats of the Approach

This section provides some tips on using modified SINDy.

  • 1.

     

    Properly design the library: Building the correct library for the regression is the most important part of this algorithm. If the library does not contain the terms included in the actual dynamics, the algorithm will fail to produce the correct noise and system model. Thus, whenever possible, one should include any prior information of the dynamics to build the library. In general, the library needs to be large enough to include all the terms that show up in the dynamics, and at the same time small enough to ensure the robustness. Do not expect the modified SINDy will work on a library with hundreds or thousands of terms, it will break if the library is too large. For example, when using the Lorenz example with above noise, the maximum order of the library modified SINDy can handle is (about terms). This happens since the higher order terms in the library will tend to mess up the forward and backward simulation and producing the nan cost, making the optimizer fails. To leverage this, one can try to decrease the learning rate of the optimizer, pre-smooth the data, get better initial estimate of , reduce the library size, or set optimization parameters type as float64. Moreover, whether the constant term should be included is case-specific. If the actual dynamics do not have a constant term and the measurement noise is non-zero mean or has significant outliers, including the constant basis in the library will make modified SINDy get stuck at the local minimum more easily. It is advised that the user tries both the library with and without constant basis.

  • 2.

     

    Initial guess of and : Having a good initial guess of the estimated noise and estimated selection parameter can improve the condition of the optimization problem and allowing us tackle harder problem with more library terms. If possible, the initial values of can be obtained by pre-smoothing the noisy signal, which will provide a good start for the optimization problem, and it is also good for estimating . If no other information is given, the initial guess of the can be set as zeros.

  • Algorithm 1 Modified SINDy

    Footnotes

    1. Corresponding author (kadierk@uw.edu); Code available at github.com/dynamicslab/modified-SINDy.

    References

    1. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving and M. Isard (2016) Tensorflow: a system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp. 265–283. Cited by: §1, §2.2.
    2. H. Akaike (1969) Fitting autoregressive models for prediction. Ann Inst Stat Math 21 (1), pp. 243–247. Cited by: §1.
    3. A. G. Baydin, B. A. Pearlmutter, A. A. Radul and J. M. Siskind (2017) Automatic differentiation in machine learning: a survey. The Journal of Machine Learning Research 18 (1), pp. 5595–5637. Cited by: §1, §1.
    4. S. Beetham and J. Capecelatro (2020) Formulating turbulence closures using sparse regression with embedded form invariance. arXiv preprint arXiv:2003.12884. Cited by: §1.
    5. S. A. Billings (2013) Nonlinear system identification: narmax methods in the time, frequency, and spatio-temporal domains. John Wiley & Sons. Cited by: §1.
    6. J. Bongard and H. Lipson (2007) Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sciences 104 (24), pp. 9943–9948. Cited by: §1.
    7. L. Boninsegna, F. Nüske and C. Clementi (2018) Sparse learning of stochastic dynamical equations. The Journal of chemical physics 148 (24), pp. 241723. Cited by: Appendix B, §1.
    8. Boninsegna,Lorenzo, Nüske,Feliks and Clementi,Cecilia (2018) Sparse learning of stochastic dynamical equations. The Journal of Chemical Physics 148 (24), pp. 241723. External Links: Document Cited by: §2.1.
    9. G. Both, S. Choudhury, P. Sens and R. Kusters (2019) DeepMoD: deep learning for model discovery in noisy data. arXiv preprint arXiv:1904.09406. Cited by: §1.
    10. S. L. Brunton and J. N. Kutz (2019) Data-driven science and engineering: machine learning, dynamical systems, and control. Cambridge University Press. Cited by: §1.
    11. S. L. Brunton, J. L. Proctor and J. N. Kutz (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences 113 (15), pp. 3932–3937. Cited by: Figure 14, Appendix E, §1, §1, §2.1, §2.2, §3.2.
    12. S. L. Brunton, J. L. Proctor and J. N. Kutz (2016) Sparse identification of nonlinear dynamics with control (SINDYc). IFAC-PapersOnLine 49 (18), pp. 710–715. Cited by: §1, §5.
    13. M. Budišić, R. Mohr and I. Mezić (2012) Applied Koopmanism. Chaos: An Interdisciplinary Journal of Nonlinear Science 22 (4), pp. 047510. Cited by: §1.
    14. K. Champion, B. Lusch, J. N. Kutz and S. L. Brunton (2019) Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences 116 (45), pp. 22445–22451. Cited by: §1.
    15. K. P. Champion, S. L. Brunton and J. N. Kutz (2019) Discovery of nonlinear multiscale systems: sampling strategies and embeddings. SIAM Journal on Applied Dynamical Systems 18 (1), pp. 312–333. Cited by: §1.
    16. K. Champion, A. Zheng, S. Brunton and J. Nathan Kutz (2019) A unified sparse optimization framework to learn parsimonious physics-informed models from data. arxiv 0, pp. 1906.10612v1. Cited by: §1, §2.1, §2.1.
    17. R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud (2018) Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571–6583. Cited by: §1.
    18. Cokelaer/fitter: v1.2.3 synchronised on pypi External Links: Document, Link Cited by: §4.4.
    19. M. Dam, M. Brøns, J. Juul Rasmussen, V. Naulin and J. S. Hesthaven (2017) Sparse identification of a predator-prey system from simulation data of a convection model. Physics of Plasmas 24 (2), pp. 022310. Cited by: §1.
    20. B. C. Daniels and I. Nemenman (2015) Automated adaptive inference of phenomenological dynamical models. Nature communications 6. Cited by: §1.
    21. B. de Silva, D. M. Higdon, S. L. Brunton and J. N. Kutz (2019) Discovery of physics from data: universal laws and discrepancy models. arXiv preprint arXiv:1906.07906. Cited by: §1.
    22. B. M. de Silva, K. Champion, M. Quade, J. Loiseau, J. N. Kutz and S. L. Brunton (2020) PySINDy: A Python package for the sparse identification of nonlinear dynamics from data. arXiv preprint arXiv:2004.08424. Cited by: §1.
    23. N. Deng, B. R. Noack, M. Morzyński and L. R. Pastur (2020) Low-order model for successive bifurcations of the fluidic pinball. Journal of Fluid Mechanics 884. Cited by: §1.
    24. Y. El Sayed M, R. Semaan and R. Radespiel (2018) Sparse modeling of the lift gains of a high-lift configuration with periodic coanda blowing. In 2018 AIAA Aerospace Sciences Meeting, pp. 1054. Cited by: §1.
    25. P. Gelß, S. Klus, J. Eisert and C. Schütte (2019) Multidimensional approximation of nonlinear dynamical systems. Journal of Computational and Nonlinear Dynamics 14 (6). Cited by: §1.
    26. D. Giannakis and A. J. Majda (2012) Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability. Proceedings of the National Academy of Sciences 109 (7), pp. 2222–2227. Cited by: §1.
    27. A. Goeßmann, M. Götte, I. Roth, R. Sweke, G. Kutyniok and J. Eisert (2020) Tensor network approaches for learning non-linear dynamical laws. arXiv preprint arXiv:2002.12388. Cited by: §1.
    28. M. Hoffmann, C. Fröhner and F. Noé (2019) Reactive SINDy: discovering governing reactions from concentration data. Journal of Chemical Physics 150 (025101). Cited by: §1.
    29. M. Innes (2018) Flux: elegant machine learning with Julia. Journal of Open Source Software. External Links: Document Cited by: §2.2.
    30. K. Kaheman, E. Kaiser, B. Strom, J. N. Kutz and S. L. Brunton (2019) Learning discrepancy models from experimental data. Conference on Decision and Control. Cited by: §1, §4.5, §4.
    31. K. Kaheman, J. N. Kutz and S. L. Brunton (2020) SINDy-PI: a robust algorithm for parallel implicit sparse identification of nonlinear dynamics. arXiv preprint arXiv:2004.02322. Cited by: §1, §5.
    32. E. Kaiser, J. N. Kutz and S. L. Brunton (2018) Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society of London A 474 (2219). Cited by: §1.
    33. D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.2.
    34. S. Klus, F. Nüske, P. Koltai, H. Wu, I. Kevrekidis, C. Schütte and F. Noé (2018) Data-driven model reduction and transfer operator approximation. Journal of Nonlinear Science. Cited by: §1.
    35. J. N. Kutz (2013) Data-driven modeling & scientific computation: Methods for complex systems & big data. Oxford University Press. Cited by: §1.
    36. J. N. Kutz, S. L. Brunton, B. W. Brunton and J. L. Proctor (2016) Dynamic mode decomposition: data-driven modeling of complex systems. SIAM. Cited by: §1.
    37. Z. Lai and S. Nagarajaiah (2019) Sparse structural system identification method for nonlinear dynamic systems with hysteresis/inelastic behavior. Mechanical Systems and Signal Processing 117, pp. 813–842. Cited by: §1.
    38. H. Lange, S. L. Brunton and N. Kutz (2020) From Fourier to Koopman: spectral methods for long-term time series prediction. arXiv preprint arXiv:2004.00574. Cited by: §1.
    39. L. Ljung (2010) Perspectives on system identification. Annual Reviews in Control 34 (1), pp. 1–12. Cited by: §1.
    40. J.-C. Loiseau, B. R. Noack and S. L. Brunton (2018) Sparse reduced-order modelling: Sensor-based dynamics to full-state estimation. Journal of Fluid Mechanics 844, pp. 459–490. Cited by: §1.
    41. J. Loiseau and S. L. Brunton (2018) Constrained sparse Galerkin regression. Journal of Fluid Mechanics 838, pp. 42–67. Cited by: §1.
    42. J. Loiseau (2020) Data-driven modeling of the chaotic thermal convection in an annular thermosyphon. Theoretical and Computational Fluid Dynamics, pp. 1–27. Cited by: §1.
    43. L. Lu, X. Meng, Z. Mao and G. E. Karniadakis (2019) DeepXDE: a deep learning library for solving differential equations. arXiv preprint arXiv:1907.04502. Cited by: §1.
    44. N. M. Mangan, J. N. Kutz, S. L. Brunton and J. L. Proctor (2017) Model selection for dynamical systems via sparse regression and information criteria. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473 (2204), pp. 20170009. External Links: Document Cited by: §1.
    45. N. M. Mangan, T. Askham, S. L. Brunton, J. N. Kutz and J. L. Proctor (2019) Model selection for hybrid dynamical systems via sparse regression. Proceedings of the Royal Society A 475 (2223), pp. 20180534. Cited by: §1.
    46. N. M. Mangan, S. L. Brunton, J. L. Proctor and J. N. Kutz (2016) Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Transactions on Molecular, Biological and Multi-Scale Communications 2 (1), pp. 52–63. Cited by: §1.
    47. A. Mardt, L. Pasquali, H. Wu and F. Noé (2018) VAMPnets: deep learning of molecular kinetics. Nature Communications 9 (5). Cited by: §1.
    48. D. A. Messenger and D. M. Bortz (2020) Weak SINDy for partial differential equations. arXiv preprint arXiv:2007.02848. Cited by: §1, §2.1.
    49. D. A. Messenger and D. M. Bortz (2020) Weak SINDy: galerkin-based data-driven model selection.. arXiv preprint arXiv:2005.04339. Cited by: Appendix F, §1, §2.1, §3.1, §3.2.
    50. I. Mezic (2013) Analysis of fluid flows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics 45, pp. 357–378. Cited by: §1.
    51. A. Narasingam and J. S. Kwon (2018) Data-driven identification of interpretable reduced-order models using sparse regression. Computers & Chemical Engineering 119, pp. 101–111. Cited by: §1.
    52. O. Nelles (2013) Nonlinear system identification: from classical approaches to neural networks and fuzzy models. Springer. Cited by: §1.
    53. S. Pan, N. Arnold-Medabalimi and K. Duraisamy (2020) Sparsity-promoting algorithms for the discovery of informative Koopman invariant subspaces. arXiv preprint arXiv:2002.10637. Cited by: §1.
    54. W. Pan, Y. Yuan, J. Gonçalves and G. Stan (2016-01) A sparse Bayesian approach to the identification of nonlinear state-space systems. IEEE Transactions on Automatic Control 61 (1), pp. 182–187. External Links: Document Cited by: §2.1.
    55. J. Pathak, B. Hunt, M. Girvan, Z. Lu and E. Ott (2018) Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Physical review letters 120 (2), pp. 024102. Cited by: §1.
    56. C. Rackauckas, Y. Ma, J. Martensen, C. Warner, K. Zubov, R. Supekar, D. Skinner and A. Ramadhan (2020) Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385. Cited by: §1.
    57. C. Rackauckas and Q. Nie (2017) DifferentialEquations.jl–a performant and feature-rich ecosystem for solving differential equations in Julia. Journal of Open Research Software 5 (1). Cited by: §1.
    58. M. Raissi, P. Perdikaris and G. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.
    59. M. Raissi and G. E. Karniadakis (2017) Machine learning of linear differential equations using gaussian processes. arXiv preprint arXiv:1701.02440. Cited by: §1.
    60. M. Raissi and G. E. Karniadakis (2018) Hidden physics models: machine learning of nonlinear partial differential equations. Journal of Computational Physics 357, pp. 125–141. Cited by: §1.
    61. M. Raissi, A. Yazdani and G. E. Karniadakis (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367 (6481), pp. 1026–1030. Cited by: §1.
    62. P. A. Reinbold, D. R. Gurevich and R. O. Grigoriev (2020) Using noisy or incomplete data to discover models of spatiotemporal dynamics. Physical Review E 101 (1), pp. 010203. Cited by: §1, §2.1.
    63. S. Rudy, A. Alla, S. L. Brunton and J. N. Kutz (2019) Data-driven identification of parametric partial differential equations. SIAM Journal on Applied Dynamical Systems 18 (2), pp. 643–660. Cited by: §1.
    64. S. H. Rudy, S. L. Brunton and J. N. Kutz (2019) Smoothing and parameter estimation by soft-adherence to governing equations. Journal of Computational Physics 398, pp. 108860. Cited by: §1.
    65. S. H. Rudy, S. L. Brunton, J. L. Proctor and J. N. Kutz (2017) Data-driven discovery of partial differential equations. Science Advances 3 (4), pp. e1602614. Cited by: §1, §1, §2.1.
    66. S. H. Rudy, J. N. Kutz and S. L. Brunton (2019) Deep learning of dynamics and signal-noise decomposition with time-stepping constraints. Journal of Computational Physics 396, pp. 483–506. Cited by: Figure 12, Appendix C, Automatic Differentiation to Simultaneously Identify Nonlinear Dynamics and Extract Noise Probability Distributions from Data, §1, §1, §1, §2.2, Figure 2, Figure 3, Figure 4, §3.1, §3.2, §3.2, §3.2, §3.3, §3, §5, §5.
    67. H. Schaeffer and S. G. McCalla (2017) Sparse model selection via integral terms. Physical Review E 96 (2), pp. 023302. Cited by: §1, §2.1.
    68. H. Schaeffer, G. Tran and R. Ward (2018) Extracting sparse high-dimensional dynamics from limited data. SIAM Journal on Applied Mathematics 78 (6), pp. 3279–3295. Cited by: §1.
    69. H. Schaeffer (2017) Learning partial differential equations via data discovery and sparse optimization. In Proc. R. Soc. A, Vol. 473, pp. 20160446. Cited by: §1, §1.
    70. M. Schmelzer, R. P. Dwight and P. Cinnella (2020) Discovery of algebraic Reynolds-stress models using sparse symbolic regression. Flow, Turbulence and Combustion 104 (2), pp. 579–603. Cited by: §1.
    71. P. J. Schmid (2010-08) Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics 656, pp. 5–28 (English). External Links: ISSN 0022-1120 Cited by: §1.
    72. M. Schmidt and H. Lipson (2009) Distilling free-form natural laws from experimental data. Science 324 (5923), pp. 81–85. Cited by: §1.
    73. M. Sorokina, S. Sygletos and S. Turitsyn (2016) Sparse identification for nonlinear optical communication systems: SINO method. Optics express 24 (26), pp. 30433–30443. Cited by: §1.
    74. W. Su, M. Bogdan and E. Candes (2017) False discoveries occur early on the Lasso path. The Annals of statistics 45 (5), pp. 2133–2150. Cited by: §1.
    75. S. Thaler, L. Paehler and N. A. Adams (2019) Sparse identification of truncation errors. Journal of Computational Physics 397, pp. 108851. Cited by: §1.
    76. R. Tibshirani (1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §1, §2.1.
    77. G. Tran and R. Ward (2017) Exact recovery of chaotic systems from highly corrupted data. Multiscale Modeling & Simulation 15 (3), pp. 1108–1129. Cited by: §1, §2.1.
    78. F. van Breugel, J. N. Kutz and B. Brunton (2020) Numerical differentiation of noisy data: a unifying multi-objective optimization framework. arXiv preprint arXiv:2009.01911. Cited by: §1.
    79. B. Van Merriënboer, O. Breuleux, A. Bergeron and P. Lamblin (2018) Automatic differentiation in ML: where we are and where we should be going. In Advances in neural information processing systems, pp. 8757–8767. Cited by: §1.
    80. P. R. Vlachas, W. Byeon, Z. Y. Wan, T. P. Sapsis and P. Koumoutsakos (2018) Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. Proc. R. Soc. A 474 (2213), pp. 20170844. Cited by: §1.
    81. C. Wehmeyer and F. Noé (2018) Time-lagged autoencoders: deep learning of slow collective variables for molecular kinetics. The Journal of Chemical Physics 148 (241703), pp. 1–9. Cited by: §1.
    82. M. O. Williams, I. G. Kevrekidis and C. W. Rowley (2015) A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science 6, pp. 1307–1346. Cited by: §1.
    83. K. Wu and D. Xiu (2019) Numerical aspects for approximating governing equations using data. Journal of Computational Physics 384, pp. 200–221. Cited by: §1.
    84. O. Yair, R. Talmon, R. R. Coifman and I. G. Kevrekidis (2017) Reconstruction of normal forms by learning informed observation geometries from data. Proceedings of the National Academy of Sciences, pp. 201620045. Cited by: §1.
    85. L. Yang, D. Zhang and G. E. Karniadakis (2018) Physics-informed generative adversarial networks for stochastic differential equations. arXiv preprint arXiv:1811.02033. Cited by: §1.
    86. C. Yao and E. M. Bollt (2007) Modeling and nonlinear parameter estimation with Kronecker product representation for coupled oscillators and spatiotemporal systems. Physica D: Nonlinear Phenomena 227 (1), pp. 78–99. Cited by: §1.
    87. L. Zhang and H. Schaeffer (2019) On the convergence of the SINDy algorithm. Multiscale Modeling & Simulation 17 (3), pp. 948–972. Cited by: §1, §2.1.
    88. S. Zhang and G. Lin (2018) Robust data-driven discovery of governing physical laws with error bars. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 474 (2217), pp. 20180305. External Links: Document Cited by: §1, §2.1.
    89. S. Zhang and G. Lin (2019) Robust data-driven discovery of governing physical laws using a new subsampling-based sparse Bayesian method to tackle four challenges (large noise, outliers, data integration, and extrapolation). arXiv preprint arXiv:1907.07788. Cited by: §2.1.
    90. P. Zheng, T. Askham, S. L. Brunton, J. N. Kutz and A. Y. Aravkin (2018) A unified framework for sparse relaxed regularized regression: SR3. IEEE Access 7, pp. 1404–1423. Cited by: §1, §2.1.