Parameter estimation, nonlinearity and Occam's razor
Abstract
Nonlinear systems are capable of displaying complex behavior even if this is the result of a small number of interacting time scales. A widely studied case is when complex dynamics emerges out of a nonlinear system being forced by a simple harmonic function. In order to identify if a recorded time series is the result of a nonlinear system responding to a simpler forcing, we develop a discrete nonlinear transformation for time series based on synchronization techniques. This allows a parameter estimation procedure which simultaneously searches for a good fit of the recorded data, and small complexity of a fluctuating driving parameter. We illustrate this procedure using data from respiratory patterns during birdsong production.
Nonlinear systems have the remarkable property that complex behavior can emerge out of seemingly simple rules. This invites the possibility that much of the complexity in a recorded time series can be explained by simple nonlinear mechanisms. Here we are interested in the case that the dynamics corresponds to a simple nonlinear system forced by an external signal. In order to test for this possibility, we define a procedure which aims at approximating the dataset with minimal complexity of the forcing signal. We test this approach in a model for the firing rate of populations of neurons with both model generated data and physiological recordings of the air sac pressure of a singing bird. We find model parameters for which the solutions of the forced system are a good approximation to the data and at the same time, the forcing signal exhibits low dynamical complexity. Thus, we show that there are parameters in the model for which representations of the dataset are encoded into simpler signals.
I Introduction
Parameter estimation in smooth dynamical systems from experimental time series is central to many disciplines across the natural sciences. In many situations, the system under study can be modeled by a set of differential equations which governs the evolution of the state variables. Usually, such dynamical models will contain unknown parameters which one expects to estimate from the data. While much progress has been made in the case of linear dynamics, less is known about the more general case in which the dynamics involves nonlinear terms. Parameter estimation in nonlinear models is strongly case dependent partly because the model solutions can show a qualitative dependence on parameter values (1). Also, analytic solutions for a general smooth nonlinear system are typically unknown, posing serious difficulties when trying to associate estimated parameters with probabilities or likelihoods (2).
It is widely believed that the temporal evolution of macroscopic phenomena obeys deterministic rules. Sometimes these rules are derived axiomatically as it is the case for classical and quantum mechanics, while in other areas of knowledge the proposed relationships attempt to capture phenomenological observations. In many situations, macroscopic phenomena is thought to emerge out of the collective action of many microscopic units, whose dynamical rules can be derived from abinitio principles. However, establishing correspondence between these two scales is one of the oldest and hardest challenges in physics. This difficulty becomes explicit when studying systems that operate far from equilibrium, as is the case for example, in biological systems. Independently of how these rules are derived, the problem of approximating data with solutions of differential equations seems to be an unavoidable step when comparing theories against real world data. This is especially difficult if the dynamical equations contain nonlinear terms of their state variables, even if the system is low dimensional. This problem has been addressed by many authors with particular emphasis in the case that the underlying dynamics is chaotic (3); (4).
In many situations the causal relationships between the state variables of a system can be modeled by a set of differential equations or vector fields. Usually, the equations contain parameters which are assumed either to be stationary or to have a much slower temporal evolution than the state variables. Here, we consider such parameterized families of smooth vector fields. Let be a smooth vector field with and . We then have a continuously pindexed set of solutions for the associated initial value problem
(1) 
Given a time series we can compare solutions against data using a metric . Let be a model for , we then investigate the following questions:

Is the proposed model able to reproduce/explain the data to desired accuracy?

Which are the parameters that yield the best approximation?
Admittedly, there are more questions to be addressed such as robustness and likelihood of the approximation, but these can be regarded as ulterior inquiries once the answers to the first two points are found. One could attempt an answer by computing the error between the data and the model solutions as a function of the initial conditions and the parameters. The cost function can be defined as
(2) 
The functional form of is in general unknown. Assuming that is a discrete sampling of the data with elements, we consider a discretized cost function
(3) 
Here are the values of the solutions in a discrete time domain: this function can be computed numerically to desired accuracy. Minimization of this function yields sets of initial conditions and parameters for which the solutions of the model better resemble the experimental time series, as measured by . However, it is known that this approach has severe limitations: there are no known global optimization methods for arbitrary nonlinear functions. Good solutions can be obtained efficiently by heuristic solvers such as genetic algorithms, simulated annealing, or the amoeba algorithm (6); (5); (7): however, finding the best solution seems to be an intractable problem for many nonlinear systems.
A major contributing factor to this intractability is that the landscape of can be extremely complicated. For the sake of illustration, let us assume that is the output of a computer running a simulation of a known low dimensional chaotic system and also that we have access to the program being run. Since the dynamical rules are known along with the initial conditions and the parameters that generate the data, we can check that the global minimum of , thus enabling us to conclude that we have the right model for the data, as expected. One validation criteria which we could posit would read as follow: there are points for which the model solutions resemble the data optimally, since lower errors cannot be attained. Although this criteria seems sufficiently general and does not require knowledge of solutions, in a more realistic setting, the value of the global minimum is unknown and all that can be addressed numerically is local information of the landscape. This information is completely useless even in this numerical gedankenexperiment.
Consider now what happens to the landscape of as more data is introduced. Imagine we are trying to estimate the initial conditions which were used to start the simulation: due to chaos, nearby solutions will diverge exponentially yielding high values in C. Eventually as is increased, the values of become settled at some average value resembling the size of the attractor in phase space. This happens for every point in the domain of that is not the global minimum. Therefore, the landscape near the global minimum becomes extremely sharp as more data is introduced posing a severe limitation to address our agenda.
This point is illustrated in Figure 1 for the Lorenz system (8). We generated data by numerical integration of the Lorenz equations for values of the parameters such that there is a chaotic attractor (see figure caption). For this we picked initial conditions and parameters and simulated the system for steps. The resulting solutions are used as data for the cost function of the system (3): we compute mismatch of model output and data with the least squares metric , and assume that each corresponds to a time step in an RungeKutta O(4) integration routine with . Figure 1 shows how the landscape of changes as is increased. When the number of data points is low, the global minimum is easily identified. As more data is introduced, the landscape near the global minimum becomes extremely sharp until it can no longer be resolved in the plots. The choice to evaluate as a function of initial conditions was made for simplicity but the same properties should hold for parameter space. Because the attractor persists for small variations of the parameters, a similar argument can be made with the same outcome. For values of the parameters close to , the shape of the attractor changes but the mixing properties of the vector field remain invariant: if initial conditions are kept fixed and a neighborhood of is explored, solutions will land on attractors that are slightly different and this will have the same effect as if the same attractor is being approached from slightly different initial conditions. Overall, this example suggests that if we are to test the hypothesis that data corresponds to a chaotic system, a departure from the cost function is in order.
The keen observer has probably realized that if the number of data points is low, the minima looks round and is easy to find, suggesting that tracking the parameters over short segments of time could prove useful. This is the idea behind coupling the model to the data and it has been explored by many authors: in particular, it has been show that under certain conditions, when the model is coupled to the data the cost function is smoothed or ‘regularized’, and parameters can be estimated (9); (10); (11). Unfortunately, the regularized cost function can be extremely flat and therefore, finding its minima becomes hard to do numerically. Recently, Abarbanel et al. proposed a method to estimate parameters in chaotic models called DPE (12); (13) which addresses this issue. They regularize the cost function by coupling the model with the data while penalizing coupling by adding a term proportional to the coupling strength in a new balanced cost function: this leads to a function that can be minimized and yields the right answer. We follow a closely related approach, but aim for a different question: can data be approximated by smooth unfoldings of low dimensional systems?
Synchronizationbased parameter estimation techniques rely on coupling the data to the model equations and are at the heart of many estimation procedures (14). In this work we introduce a synchronizationbased transformation for time series which aims at exploring the possibility that data can be approximated by a driven nonlinear system. For this we explore a family of synchronization schemes by allowing model parameters to change in time so that the resulting solutions would better resemble a given dataset. In this way a map is induced between transient solutions of the model and the parametric fluctuations which are required to generate them. We assume that in general, better approximations come at the cost of introducing more complex fluctuations. However, it could be the case that there are parameters for which the fluctuations take a particularly simple form at the same time that they yield solutions which are good approximations to the dataset.
In what follows, the transformation is motivated by considering an idealized continuous case while rigorous definitions are postponed to the next section. In order to allow for better approximations we consider an extension of the cost function which includes arbitrary parametric fluctuations in . Let be a path in parameter space with . For each we can define the functional as
(4) 
Let be the minimum value of and the ‘location’ of the minimum, . We define the nonlinear transformation of our dataset as
(5) 
Ideally, this transformation takes the experimental time series as input and returns the fluctuations in the space of parameters that are necessary for system (1) solutions to optimally match the dataset as measured by . The goal here is to define a map between a particular transient solution of the model and the parametric fluctuations which generate it. Since parameter fluctuations greatly augment the degrees of freedom of it is expected in general that . For a sufficiently complex fluctuation we will be able to approximate very well regardless of the structure imposed by our assumptions. We note however, that for different values of the parameters of , there might be other simpler fluctuations which might approximate similarly well. Our objective is to select the parameters for which the optimally matching fluctuations take the simplest form while the approximation also meets a goodness criteria. Let be a measure of the complexity of the required fluctuations such that when . Our hypothesis is that for most parameter values there is a tradeoff between and in such a way that obtaining good solutions comes at the cost of increasing the complexity of the corresponding fluctuations. Therefore, we seek values of the parameters for which our hypothesis does not hold: namely, better approximations are obtained by simpler parametric fluctuations. If the fluctuations that are required to approximate a given dataset are as complex as the dataset itself, we would argue that the model is acting as a complicated mapping. However, if the approximations are generated by simple time dependent parametric fluctuations, any additional complexity that the data might have which is captured by the model solutions, would be due to specific nonlinearities in . If it is true that there is a negative correlation between the error of the approximations and the complexity of the corresponding fluctuations, departures from this general tendency could be quantified and exploited to detect parameter values of interest.
In the next section we propose a discrete nonlinear transformation for time series inspired by the continuous case. Later in this article this definition is applied to construct criteria for model selection in a hypothetical scenario: rate coding in driven neural populations. It is shown numerically for a test case that , for a particular choice of . This in turn is exploited to estimate the parameters and the driving signal used to generate the data. The same construction is finally applied to experimental data.
This work is organized as follows: section 2 contains the definitions and describes a numerical implementation of the transformation. In section 3 we analyze a case study: a simple model for the rate of fire in neural tissue forced by an external signal. We test the usefulness of the transformation with numerical solutions generated by the model. It is shown numerically that we can identify the parameters used to generate the data without making assumptions on the functional form of the forcing signal. Finally, the problem of motor patterns in birdsong is discussed in section 4. We show that there are regions in parameter space for which the synchronization manifold of the model and the respiratory gestures used during song production exhibits subharmonic behavior.
Ii Definitions and numerical implementation
In this section we discuss a numerical procedure inspired in the previous discussion. We assume that data is sampled at discrete time intervals and that satisfies the hypothesis of a RungeKutta O(4) method (15). Let be a time series with samples and be a numerical solution of
(6)  
Our focus is to find such that matches as measured by . For simplicity, we assume that each observation corresponds to a time step of the numerical approximation . In order to define the transformation we specify a domain for the fluctuations. Let be the maximum allowed departure from so that at each integration step can take any value in the range . For each , the cost function of the system (3) can be augmented so that it includes parameter fluctuations at each integration step,
(7) 
In this way the functional optimization (4) is approximated by a high dimensional nonlinear optimization problem without constraints: while in general this optimization cannot be performed for large , there are efficient approaches for particular choices of and .
We define the transformation as
(8) 
and introduce the notation for the corresponding model output
(9) 
where the sequence is a numerical approximation of the non autonomous system (6) given by a RungeKutta (O4) method (15). Here and are fixed and our aim is to solve for . The minima of this function corresponds to the fluctuations that yield the observable time series which better approximates the data. In order to tackle the optimization problem the fluctuations are restricted to take a discrete set of possible values. Let be the number of equally spaced values that can take at each integration step. The number of possible solutions is then , thus, attempts to optimize this function are bound to fail for large .
The strategy we then follow is to calculate in small running windows of size . Since in general we cannot minimize nonlinear functions, we perform this task by brute force search on a discrete set of equally spaced values in . In any given window, this yields the model solution and the fluctuations . We define a new for the subsequent window using the first step of the approximation obtained at step . Finally we define the transformation ,
(10)  
By iterating this map two time series are obtained: the solution of system (6) that aims to approximate the data and the fluctuations which are required to generate it. Thus, for each point in we have defined a three parameter transformation of the dataset. This is a special case of a more general definition described in the appendix.
In the next section this transformation is applied to a case study. We devise a scenario in which complex output is attained by simple parametric oscillations of a low dimensional dynamical system. It is found that this procedure yields a good approximation for . This is supported by numerical results in the next section but it is not expected to be the general case. However, we do expect that if we have the right model, it will synchronize with the data and good approximations will be obtained. In general we expect that by allowing parametric fluctuations, better approximations can be achieved, even if the model is wrong. In that case, we also expect that the required fluctuations are complex. The purpose of this procedure is to systematically investigate the possibility that there are parameters in the model for which this transformation takes a simple form while still approximating the data.
Iii Case study: Rate coding in populations of neurons.
In this section we study a phenomenological model for neural tissue under different perturbation regimes. The assumption is that the firing rate of a population of neurons decays exponentially in time in the absence of stimuli and that the population response to input becomes saturated after a threshold value. Here we study a simple architecture consisting of two interacting populations of inhibitory and excitatory neurons. This is also known as the additive model and it was first proposed as a model for populations of localized neurons by Wilson and Cowan (16),
(11)  
An excitatory population is coupled to an inhibitory one through the connectivities , both populations receive a constant input given by and is the timescale of the system. The assumption of saturation is implemented by the sigmoid function . This system presents a rich bifurcation diagram in the plane for open sets of values of the connectivities (17). Since the system is two dimensional, we can’t expect its solutions to be more complicated than a limit cycle. However, if we also assume that the system is being driven by an external signal a wild diversity of behaviors are unleashed.
In order to test the transformation introduced in the last section, a dataset was generated by driving the model with simple fluctuations in parameter . In this scenario, is interpreted as an external signal that is being injected into the excitatory population and processed by the network. We are interested in testing our hypothesis for the case in which complex output is attained by simple parametric fluctuations: in this spirit, we explore a family of fluctuations parametrized by ,
(12)  
We choose parameters so that the system undergoes bifurcations as evolves: and . System (12) is then numerically integrated using a RungeKutta O(4) routine with temporal resolution . For this choice of parameters the global attractor of the autonomous system () is an attractive limit cycle: a stable nonlinear oscillatory solution. It is know that these solutions can synchronize with an external periodic forcing (18). Due to the nonlinear nature of these solutions, this entrainment occurs in very different ways: for some regions of space the period of the solution matches the forcing period (1:1), but there are regions for which this ratio can take in principle any (p:q) value. These regions are generically known as Arnold tongues due to their Vshape in space and we refer to them collectively as the Arnold tongues diagram of the system (19). The morphology of the solutions greatly differs from one tongue to another so this offers an appealing strategy for coding different morphologies into simple control signals. While the quantitative features of this organizing map are model dependent, the qualitative geometrical mechanisms by which these solutions are generated can be characterized by topological methods(29); (30). A schematic representation of the model is shown in Figure 2 (a). Figure 2 (b) shows the dataset to be analyzed in this section. This solution was found by computing numerically the Arnold tongues diagram of the system in space and searching for responses with a period larger than 10 times the period of the forcing.
Next we ask if we can recover the test parameters by assuming the unperturbed system (11) and generic input,
(13)  
The role of the third equation is to force the input to population to have bounded derivative. This hypothesis is explicitly included in the model equations via the auxiliary variable and parameter . Note that if , solutions of system (13) are identical to those of system(11). We explore coupling of this augmented model with data by computing when parameter is allowed to fluctuate. The effect of the additional equation can be roughly described as follows: for high values of we’ll have that , while for lower values of we’ll obtain smoother oscillations in . Here we note that while the resulting parametric fluctuations represent a candidate solution to eqn. (8) when assuming system (13), the corresponding solution also represents an approximation to eqn. (8) when considering the unperturbed system (11). Since both alternatives yield the same error value, we can choose to consider data as being approximated by system (11) when forced by or as being approximated by system (13) when forced by . This choice will become important when quantifying the complexity of the resulting transformation due to the choice of and also when considering experimental data.
Let us summarize the scheme adopted for the rest of the article: parameter fluctuations are allowed in system (13) by coupling model and data through . The transformation was done as depicted in the previous section: at each time step, the fluctuations take any of equally spaced values in . Window size for local fits is . This process yields an approximating time series along with the fluctuations that generate it. As discussed before, we can consider that provided we state that system (11) is being assumed. Finally, the metric by which model approximations and data are compared is where each corresponds to a time step in the integration routine.
Our aim is to characterize the parametric dependence of . For this we perform the transformation in a sample domain consisting of equally spaced points in parameter space along coordinate . The rest of the parameters were kept fixed at as well as the initial conditions . The point was purposely excluded from the sampled domain as a test for robustness of the results. Figure 3 panel (a) shows the approximations to dataset along with the transformation (b) we obtained for values in a neighborhood of . This figure summarizes the main result of this article: the quality of the approximations is very similar for most values of , however, for the fluctuations take a particularly simple form, which resembles very much the harmonic forcing used to generate the test data. Thus, we have produced a parameterized family of transient solutions which intersect the test solution. We find that within this parametrization there are bold variations on the dynamical complexity of the resulting transformation. This observation was found to be robust for many choices in the transformation parameters .
In order to quantify the complexity of the fluctuations we calculated their permutation entropy (20). This quantity measures the information of the order type distribution of consecutive values of a time series. Since the number of possible order types is , the order of permutation entropy has to be low enough as to assure that the order type distribution is reasonably sampled. In this case, since the number of data points is the permutation entropy was calculated to order . The quality of the approximation can be captured by the error as defined before (7),
(14) 
Figure 4 (b) shows the values of error and entropy of the fluctuations as a function of . While the error decreases monotonically, the entropy curve exhibits a dent. This means that, for the tested domain, as error decreases entropy increases except for the dented region, where this simple relationship seems to break. To make this statement more precise, Figure 4 (a) shows the errorentropy distribution obtained by evaluation of the transformation in the sample domain. We quantify the aforementioned tradeoff by fitting a line () to the cloud and coloring the points according to their normal distance to the fit. In this way departures from the negative correlation are addressed quantitatively. For each point in the normal intersection with the fitted line occurs in , . The distance to the fit then reads
(15)  
where a function was introduced to differentiate between favorable departures (below the line) from unfavorable ones. Departures above the fit correspond to solutions that should fail the identification criteria. This is either because the approximation is not good enough or more importantly, because the fluctuations required to generate them exhibit high dynamical complexity. By seeking minima of it can be assured that within the family of approximations induced by , the most parsimonious one in the sense of , is chosen. This enables a lex parsimoniae criteria within the restricted family along with a proof of existence of possibly interesting mechanisms.
Figure 4 (a) shows the plot of over the test domain (green curve). This quantity compares the quality of the approximations with the complexity of the required fluctuations and it is statistically defined through and . In this case, the parameters used to generate the data can be estimated by searching for minima of . These are the parameters for which a favorable balance is obtained between goodness of the approximation and the complexity of the extra assumptions contained in . This allows us to define a parsimony criteria to estimate the parameters of system (11) which in this case leads to the correct answer . This suggests that this mapping can be useful to create objective functions to train dynamical models toward simpler representations of a given dataset.
In order to make the result somewhat more general we could argue as follows: we can assume that in general there is a relationship between the argument and the minimum value of function (7) of the form . Then we can state that any sequence which violates the scaling is of interest and leave the matter of how these sequences are generated free of assumptions. Our argument then proceeds as follows: at any point in the sample domain , we have attempted an approximation to the problem of computing by trying different parameters in . By further assuming there is such that if then , we may regard as a good approximation to . This extends the parsimony rule by comparing all possible paths within numerical resolution. Plausibility of this last assumption is supported a posteriori by the numerical results in this case study. However, in the experimental case, this assumption is more daring since location of the global minimum of , or alternatively , is always unknown.
While this function can help choose parameters within a sample domain, its evaluation requires knowledge of the errorentropy distribution: this makes it hard to implement an optimization routine over a search space. In a typical experimental setting, we could be interested in every solution which satisfies some goodness criteria. Amongst the many good approximations, interest is shifted towards those which require the simplest fluctuations. This can be implemented in a function:
(16) 
where is a bound for the permutation entropy (for ), is the Heaviside function and is a threshold which represents a passing criteria for the approximations. Both arguments of the Heaviside functions cannot be true simultaneously. If the error is above the threshold, the function returns the error plus the entropy bound, otherwise, it returns the entropy value. Once the approximations drop below the threshold, the only way to obtain lower values in is to find solutions that satisfy the threshold and require simpler fluctuations. This function corresponds to the red plot in Figure 4. In this case, the threshold is the mean value of the error over the sample domain .
One would expect that if there are solutions such that the approximation is good and at the same time the fluctuations are simple, these will correspond to local minima of both functions for many choices of the sampling space and threshold. While the first one provides a tradeoff criteria to justify model parameters, the second one can be used as an objective function to train dynamic neural networks. If there is a negative correlation between error and entropy, a set of solutions that are simple and approximate data well would look like horns departing from the cloud that can be ‘shaved’ by choosing different values of in .
Finally, it should be noted that the reason that both the model parameters and the forcing signal can be ‘guessed’ at the same time is because of the fact that the fluctuations used to generate data were simple. Consider the case in which data is generated with complex fluctuations (for example, Fig. 3 with ). It would not be true anymore that transformations corresponding to nearby values of will be more complex than the fluctuations used to generate the data. While the error of the approximations will still be low for many choices of , the entropy curve will still exhibit a dent around and it would be wrongly concluded that data was generated with a harmonic forcing with . Another situation in which our approach fails is that in which data is generated by a system driven by noise, since in this case the real fluctuations would not be simple. It could also happen that there are no dents in the entropy curve. In this case it could be argued that since there are no parameters for which the fluctuations take a simple form, the structure of the model is spurious and that it should be discarded. In the next section we apply this ideas to a well studied example in motor control from the field of birdsong.
Iv Case study: Complex motor patterns in birdsong.
During song production, canaries rely on a repertoire of motor gestures which are ultimately responsible of driving their vocal organ. These gestures roughly correspond to the muscular activity by which the tension of the vocal folds is controlled and the respiratory activity which governs the air flow through the syrinx (21). Here, experimental data consists of recordings of the air sac pressure of a canary while singing and it is shown in Figure 5(b) top row (green curves). We show that there are parameters for system(11) such that the data can be approximated by model solutions if a simple forcing is also assumed.
In this example we chose a segment that contains a transition between two apparently different gestures: data consists of two approximately periodic signals with similar frequencies and marked morphological differences. Note that if no external forcing is assumed there is little hope that the 2 dimensional system (11) will approximate the data. One can speculate that data can be approximated by the forced system (12) if the amplitude and frequency of the forcing is allowed to change when the gesture changes. We can include the possibility that the system is being generically forced by assuming system (13). This assumption greatly increases the goodness of the resulting approximations yielding remarkably good fits.
The objective is to check if there are parameters for which the model approximates data with simple fluctuations. A solution is considered ‘good’ if it satisfies that the error is less than a threshold . Once this condition is achieved, we seek to minimize the complexity of the required fluctuations, thus ‘increasing’ the parsimony of our assumptions. This is done by optimizing function (16) on a search space over parameter and initial conditions space. The reason we choose to minimize eqn. (16) instead of eqn. (15) is that evaluation of eqn. (15) requires knowledge of the errorentropy distribution. Once a solution is found it can be checked if it is also a local minima of eqn. (15).
The optimization is performed by a genetic algorithm over a search space as follows: , , , and . In order to train the algorithm, we used a ‘low resolution’ transformation by setting . This allowed us to test for many solutions in a computationally efficient way. At this point we should point out that the landscape of will change dramatically on the choice of the parameters for . However, the suspicion remains that broad features of the landscape of are invariant to this parameter after some value, and it turned out to be a successful strategy for training the algorithm. The final scheme we adopted to define was . The maximum amplitude for the fluctuations was also adjusted to . Optimization was performed on a commercially available desk computer: the algorithm was run for 1000 generations starting from 100 random seeds and yielded parameters and initial conditions .
As before, we want to show that the parameters so obtained can be found as the minima of a function that implements a parsimony rule. For this we explore a neighborhood of the solution we found by optimizing taking equally spaced samples in the domain . Although the parameters were found by setting for computational reasons, in order to construct a higher resolution was used, so that both examples share the same synchronization scheme. Results of this section are summarized in Figure 5. The error of the approximation and the entropy of the fluctuations were calculated at each point in the sample domain. The errorentropy distribution is shown in Fig. 5(a) along with the estimated linear correlation. The corresponding plots of error, entropy and distance to the fit are shown in Figure 5(b). Local minima of were highlighted with filled circles and filled stars. Figure 5(c,d) shows the model approximation and the fluctuations at each of these locations along with the comparison with data. Green curves correspond to data, blue curves correspond to model output and the red curves are the parametric fluctuations. In order to interpret the results, a Fourier analysis was performed on each time series in (c): this analysis was done independently for each half (dashed line) of the temporal domain so that changes in the signals can be better visualized. Note that in both solutions, the forcing signal fundamental frequency is located to the right of the fundamental frequency of the model output. This suggests that the model might be responding subharmonically to the forcing signal. While both of the starred solutions seem to be interesting, solutions corresponding to the circled local minima fail for different reasons. These are plotted in Figure 5(d) for illustration: the red and yellow ones fail because the approximation is worse than in other points, even though that the fluctuations are simple. The purple one fails because of the excessive complexity of the required fluctuations. Despite that this solution is simpler than its neighbors and also approximates the data, it is more complex than the starred solution which does an equally good job at approximating the data.
It should be clear that the autonomous system (11) will not be able to reproduce data very efficiently: simple inspection of the data indicates that the pressure patterns come in at least two flavors, with and without the wiggle. Since the most similar solution that can be expected from (11) is a limit cycle, we would only be able to adjust the frequency of the cycle in order to minimize the error and both gestures would be approximated by the same morphology. If we insist on explaining the wiggles, better approximations are needed and therefore model (11) should be discarded. One however, could have other reasons to believe that the model is correct and impose the model on the data: this was done in the last section by greatly increasing the model degrees of freedom while still retaining the basic underlying structure.
In this example, the air sac pressure of a singing bird is approximated by a simple nonlinear system which is being driven by an external signal. We defined an objective function which relies on a mapping between approximations and fluctuations. This allowed us to find model parameters such that the required fluctuations exhibited minimal complexity as measured by . Once this solution was identified, we constructed a parsimony criteria by approximating the scaling relationship between error and entropy. This provided a way to leverage these two quantities and seek for an optimal solution. This balance is measured by function (15) and careful inspection shows that there is not a single ‘sweet’ spot, but many local minima. This means that there are several ways the data can be approximated by simple fluctuations.
Two mechanisms were found which are interesting and seem to differ from each other. In the first case, the data is approximated at first, as a period (1:1) response to a low frequency forcing. Then, as the wiggle becomes apparent, the model solution resembles a period (1:2) solution. The observation that canary respiratory gestures can be approximated as subharmonic solutions of a driven nonlinear substrate was proposed in (22) and later quantified in closely related models (23); (24). There, harmonic input signals were explicitly assumed and parameters were allowed to change in a stepwise manner: it was shown that this and similar models posses subharmonic solutions which yield quantitatively good fits to the data, but the criteria for the choice of the parameters was qualitative. Here, we have made a more generic assumption: that the model is being forced by a specific parameter dependent family of fluctuations. Strikingly, the subharmonic transition was identified as being one of the simplest mechanisms for approximating the data, which is the reason why it corresponds to a local minima of . Moreover, by comparing the balance of error and entropy for neighboring solutions, the choice of the model parameters is justified quantitatively.
The second mechanism is qualitatively different. At first, the model response is periodic with a period which is similar to period of the forcing. During this epoch, the forcing is also approximately periodic but it is not harmonic. When the pressure pattern changes the model solution is also very similar to a subharmonic solution but the way the bifurcation occurs is very different. A possible interpretation of this mechanism could read as follows: the forcing signal can be thought to be due to an incoming signal of frequency plus a copy of the same signal lagged by a phase . This suggests that the transition between regimes could be controlled by small changes in the lag.
This analysis allows us to conclude the following: there are parameters of system (11) for which the solutions of the model explain detailed features of the data at the cost of assuming specific realizations of the forcing signal. Given a search space, we found parameters for which the forcing signal takes the simplest possible form as measured by , within a family of signals which facilitate synchronization with data. By seeking a balance between error and parsimony, two alternative mechanisms were identified. In both cases, the second regime is approximated by a subharmonic solution, but they differ in the way the transition occurs. The possibility that the control signal is changing its frequency was explored quantitatively while careful exploration of the second mechanism remains to be done. Both possibilities are appealing because, within this frameset, they are maximally parsimonious and also they offer a testable prediction: modification of the delicate timing between the substrate response and the forcing signal should lead to very different output patterns (25).
V Concluding remarks
Estimation of model parameters from experimental time series is a central problem in many disciplines. Our limited ability to perform this task in the case the model is nonlinear can be traced back to the fact that there are no known general procedures for global nonlinear optimization. Furthermore, in an experimental setting we are likely to be uncertain about the functional form of the underlying dynamical rules. Moreover, in the case of physiological data, it is likely that the system under observation is taking input from external sources so it cannot be considered in isolation. Therefore, investigating parametric fluctuations can be relevant in some situations of interest.
In this work we proposed a nonlinear transformation for time series and motivated its usefulness. We draw inspiration from an ideally continuous case to define the transformation of a given dataset. This transformation takes the observed time series and returns the parametric fluctuations that yield solutions which optimally match the time series. It is clear that in many situations better approximations can be achieved by allowing arbitrary fluctuations in the parameters: this follows from the fact that considering arbitrary fluctuations yields a system with infinite degrees of freedom which can accommodate any dataset. However, due to nonlinearity, this transformation is highly dependent on the functional form and parameter values of model . Here we showed in a case study, that for specific nonlinearities coded in the fluctuations take a simple form. This transformation was applied to the problem of parameter estimation in a case study with both numerical and experimental data.
We tailored our procedure to address the question of whether a system is making use of nonlinear mechanisms to code for different functionalities. The idea that controlling the unstable periodic orbits in chaotic attractor could serve as a strategy to generate qualitatively different behavior was proposed by Ott et. al.(26) and demonstrated experimentally in a chemical system (27). Recent work in the field of birdsong suggests that much of the complexity observed in the song and respiratory patterns of canaries can be successfully explained by considering simple time dependent unfoldings of low dimensional nonlinear systems (21); (22); (23); (24); (28).
This ideas were tested in a simple rate coding model for populations of neurons. Data was generated by driving the model with simple harmonic signals. Small changes in the forcing signal lead to qualitatively different behavior: this is the scenario in which these methods seem to be useful. Then, we proceeded to ask if the general structure of the unperturbed model (11) held any relationship with the numeric data. By construction, solutions of the autonomous model fail to approximate the datasets. However, considering the alternative hypothesis that fluctuations were present at the time of the measurements, we found that there is a range in parameter space for which the cost of introducing this hypothesis is minimal. These ranges correspond roughly to the values of the parameters used to generate the dataset. Therefore we conclude that local minima of potentially provide insight about the underlying dynamics which generated the data. We have also studied a closely related function which does not require evaluation in a sample domain and yields the same answer for a range in the threshold values. This function was used as an objective function for model identification when we considered experimental data. Model (13) was trained to satisfy a threshold condition for the quality of the approximations by using simple fluctuations. We found ranges in parameter space for which the scaling relationship between error and entropy breaks maximally. Inspection of these minima revealed two qualitatively different solutions. Interestingly, in both cases data is approximated by a solution which resembles very much a bifurcating period 2 solution: the difference lies in the way this bifurcation is controlled.
The fact that data could be approximated by a forced low dimensional system is very interesting from the theoretical point of view. On the one hand, if the systems are low dimensional, there are methods to classify families of models according to the geometrical mechanisms which underlie the generation of complex behavior (29); (30). In particular, period doubling solutions have been found experimentally in many physical systems and they can be associated to universal scaling laws (31); (32). On the other hand, it has been shown that even in the case of having many nonlinear interacting units, the resulting macroscopic average behavior can follow low dimensional dynamical rules. A recent breakthrough in the analysis of Kuramoto’s model due to Ott and Antonsen allows to obtain mean field equations for classes of closely related models (33); (34) . In some situations, the mean field equations exhibit non trivial bifurcation diagrams which share many features with system (11). It has been shown that there are conditions under which the average behavior of a driven set of excitable units displays complex behavior by dynamical variations of the degree of synchrony in the driven population (35). More recently, it was shown that by using a mean field description, algorithms can be designed that are successful at controlling these attractors (36). Thus, there is hope that much of the complexity in macroscopic phenomena can be captured by low dimensional nonlinear mechanisms. The purpose of this work is to further bridge the gap between the description and characterization of qualitative nonlinear mechanisms and quantification. Finally, we note that the same mapping can be used to explore how a given nonlinear system may optimize other quantities of interest by considering alternative metrics on the parametric fluctuations.
Vi Acknowledgements
This work was funded by NSF grant EF0928723 awarded to Marcelo O. Magnasco. The author acknowledges the support and mentorship of Gabriel B. Mindlin and Marcelo O. Magnasco, and both review and helpful discussions with Carl D. Modes. Data was kindly provided by the Dynamical systems lab at University of Buenos Aires.
Appendix A General implementation
Here we describe a numerical procedure to build approximations to the problem of finding minima of the high dimensional error function (7) defined in section 3. The strategy is to compute in small running windows so that brute force optimization is possible. Here we state a more general definition of function (7) which allows better allocation of computational effort.
Before we allowed the fluctuations to take any of values in a domain at each time step. Now, we allow the fluctuations to take values in time bins where is an integer such that .
(17) 
Let be the minimum of (17). We define the transformation as
(18) 
And the corresponding model output
(19) 
Summarizing, we apply definitions (17), (18) and (19) in small running windows of size . The window is advanced steps and a new is determined. By concatenating the results of this procedure we construct approximations to the minima of (17). In this notation, results of this articles were obtained by setting , , , . This procedure is illustrated in Figure 6.
References
 Guckenheimer, John, and Philip Holmes. Nonlinear oscillations, dynamical systems, and bifurcations of vector fields. Vol. 42. Springer Verlag: New York, 1983.
 Lütkepohl, Helmut. New introduction to multiple time series analysis. Springer, 2007.
 Abarbanel, Henry DI, H. D., Brown, R., Sidorowich, J. J., Tsimring, L. S., The analysis of observed chaotic data in physical systems. Reviews of modern physics 65.4 (1993): 1331.
 Baker, Gregory L., Jerry P. Gollub, and James A. Blackburn. Inverting chaos: Extracting system parameters from experimental data. Chaos: An Interdisciplinary Journal of Nonlinear Science 6, no. 4 (1996): 528533.
 Kirkpatrick, Scott, C. Daniel Gelatt, and Mario P. Vecchi. Optimization by simmulated annealing. Science 220, no. 4598 (1983): 671680.
 Holland, John H. Genetic algorithms. Scientific american 267, no. 1 (1992): 6672.
 Nelder, John A., and Roger Mead. A simplex method for function minimization. The computer journal 7.4 (1965): 308313.
 Lorenz, Edward N. Deterministic nonperiodic flow. Journal of the atmospheric sciences 20.2 (1963): 130141.
 Konnur, Rahul. Synchronizationbased approach for estimating all model parameters of chaotic systems. Physical Review E 67.2 (2003): 027204.
 Parlitz, U., L. Junge, and Lj Kocarev. Synchronizationbased parameter estimation from time series. Physical Review E 54, no. 6 (1996): 6253.
 Parlitz, U. Estimating model parameters from time series by autosynchronization. Physical Review Letters 76.8 (1996): 12321235.
 Quinn, John C., Paul H. Bryant, Daniel R. Creveling, Sallee R. Klein, and Henry DI Abarbanel. Parameter and state estimation of experimental chaotic systems using synchronization. Physical Review E 80, no. 1 (2009): 016201.
 Creveling, Daniel R., Philip E. Gill, and Henry DI Abarbanel. State and parameter estimation in nonlinear systems as an optimal tracking problem. Physics Letters A 372.15 (2008): 26402644.
 Sorrentino, Francesco, and Edward Ott. Using synchronization of chaos to identify the dynamics of unknown systems. Chaos: An Interdisciplinary Journal of Nonlinear Science 19.3 (2009): 033108.
 Press, William H. Numerical recipes 3rd edition: The art of scientific computing. Cambridge university press, 2007.
 Wilson, Hugh R., and Jack D. Cowan. Excitatory and inhibitory interactions in localized populations of model neurons. Biophysical journal 12, no. 1 (1972): 124.
 Hoppensteadt, Frank Charles, and Eugene M. Izhikevich. Weakly connected neural networks. Vol. 126. New York: Springer, 1997.
 Arkady, Pikovsky, Rosenblum Michael, and J.ürgen Kurths. Synchronization. A Universal Concept in Nonlinear Sciences. Cambridge University Press (2001).
 Arnold, Vladimir. I. Geometrical Methods in the Theory of Ordinary Differential Equations (Grundlehren der mathematischen Wissenschaften). Fundamental Principles of Mathematical Science.âSpringer, Verlag, New York 250 (1983).
 Bandt, Christoph, and Bernd Pompe. Permutation entropy: a natural complexity measure for time series. Physical Review Letters 88, no. 17 (2002): 174102.
 Gardner, Tim, G. Cecchi, M. Magnasco, R. Laje, and Gabriel B. Mindlin. Simple motor gestures for birdsongs. Physical review letters 87, no. 20 (2001): 208101.
 Trevisan, Marcos A., Gabriel B. Mindlin, and Franz Goller. Nonlinear model predicts diverse respiratory patterns of birdsong. Physical review letters 96, no. 5 (2006): 058103.
 Alonso, Leandro M., Jorge A. Alliende, F. Goller, and Gabriel B. Mindlin. Lowdimensional dynamical model for the diversity of pressure patterns used in canary song. Physical Review E 79, no. 4 (2009): 041929.
 Alonso, L. M., J. A. Alliende, and G. B. Mindlin. Dynamical origin of complex motor patterns. The European Physical Journal DAtomic, Molecular, Optical and Plasma Physics 60, no. 2 (2010): 361367.
 Goldin, MatÃas A., Leandro M. Alonso, Jorge A. Alliende, Franz Goller, and Gabriel B. Mindlin. Temperature induced syllable breaking unveils nonlinearly interacting timescales in birdsong motor pathway. PloS one 8, no. 6 (2013): e67814.
 Ott, Edward, Celso Grebogi, and James A. Yorke. Controlling chaos. Physical review letters 64.11 (1990): 1196.
 Petrov, Valery, Vilmos Gaspar, Jonathan Masere, and Kenneth Showalter. Controlling chaos in the BelousovâZhabotinsky reaction. Nature 361, no. 6409 (1993): 240243.
 Perl, Yonatan Sanz, Ezequiel M. Arneodo, Ana Amador, Franz Goller, and Gabriel B. Mindlin. Reconstruction of physiological instructions from Zebra finch song. Physical Review E 84, no. 5 (2011): 051909.
 Mindlin, Gabriel B., XinJun Hou, HernÃ¡n G. Solari, R. Gilmore, and N. B. Tufillaro. Classification of strange attractors by integers. Physical review letters 64, no. 20 (1990): 2350.
 Gilmore, Robert, and Marc Lefranc. The topology of chaos: Alice in stretch and squeezeland. John Wiley & Sons, (2008).
 Feigenbaum, Mitchell J. Quantitative universality for a class of nonlinear transformations. Journal of statistical physics 19.1 (1978): 2552.
 Libchaber, A., C. Laroche, and S. Fauve. Period doubling cascade in mercury, a quantitative measurement. Journal de Physique Lettres 43.7 (1982): 211216.
 Ott, Edward, and Thomas M. Antonsen. Low dimensional behavior of large systems of globally coupled oscillators. Chaos: An Interdisciplinary Journal of Nonlinear Science 18, no. 3 (2008): 037113.
 Childs, Lauren M., and Steven H. Strogatz. Stability diagram for the forced Kuramoto model. Chaos: An Interdisciplinary Journal of Nonlinear Science 18, no. 4 (2008): 043128.
 Alonso, Leandro M., and Gabriel B. Mindlin. Average dynamics of a driven set of globally coupled excitable units. Chaos: An Interdisciplinary Journal of Nonlinear Science 21, no. 2 (2011): 023102.
 Wagemakers, Alexandre, Ernest Barreto, Miguel AF Sanjuán, and Paul So. Control of collective network chaos. Chaos: An Interdisciplinary Journal of Nonlinear Science 24, no. 2 (2014): 023127.