# A Data-Efficient Sampling Method for Estimating Basins of Attraction Using Hybrid Active Learning (HAL)

## Abstract

Although basins of attraction (BoA) diagrams are an insightful tool for understanding the behavior of nonlinear systems, generating these diagrams is either computationally expensive with simulation or difficult and cost prohibitive experimentally. This paper introduces a data-efficient sampling method for estimating BoA. The proposed method is based upon hybrid active learning (HAL) and is designed to find and label the “informative” samples, which efficiently determine the boundary of BoA. It consists of three primary parts: 1) additional sampling on trajectories (AST) to maximize the number of samples obtained from each simulation or experiment; 2) an active learning (AL) algorithm to exploit the local boundary of BoA; and 3) a density-based sampling (DBS) method to explore the global boundary of BoA. An example of estimating the BoA for a bistable nonlinear system is presented to show the high efficiency of our HAL sampling method.

Basins of Attraction Nonlinear Dynamics Active Learning Machine Learning Support Vector Machine

## 1 Introduction

Basins of attraction (BoA) provide a useful mapping from initial conditions to final steady state responses or attractors. In other words, if the system starts at any state in a basin of attraction, its trajectory will converge to that basin’s attractor. An accurate estimation of BoA is of vital importance in analyzing a system’s control stability or predicting its dynamic behavior. Traditional methods for estimating BoA can broadly be classified into Lyapunov-based and non-Lyapunov methods [8].

Lyapunov-based methods compute a Lyapunov function as a local stability criterion, and sublevel sets of this Lyapunov function, in which the function decreases along the flow, provide invariant subsets of the BoA [4, 6]. Although Lyapunov-based techniques have been successfully implemented for various nonlinear systems, they are either limited to polynomial systems or only to the local subsets of BoA near attractors. For investigating the dynamics of highly nonlinear or non-autonomous systems in a large state domain, these techniques often fail due to inaccessible Lyapunov functions.

Non-Lyapunov methods include, for instance, occupation measures [7], reachable sets [2], and trajectory reversing [5]. The occupation measures method has the advantage of theoretically guaranteed convergence over most Lyapunov-based approaches, but is still limited to polynomial systems. The reachable sets and trajectory reversing methods are applicable for more complicated nonlinear systems, but they only estimate BoA in a numerical way. The numerical expression can be used for BoA visualization or qualitative analysis, but provides limited insight on predicting a system’s dynamic behaviors. In addition, the trajectory reversing approach requires prior knowledge of a system’s governing equation for backwards integration with respect to time, which is oftentimes unavailable for real-world physical systems.

Our target is to obtain a mapping function representing a system’s BoA. The input of the function is a state of the system, and the output is the attractor where a trajectory starting from this state eventually converges. Since the number of attractors is finite, this function is a classifier, identifying to which BoA a state belongs. Obtaining a classifier requires a training set of data containing instances (states) whose category membership (BoA the states belong to) is known. This paper introduces a data-efficient approach to sample such training data.

## 2 Sampling Method

Uniform sampling is a brute-force method to obtain the training data. For the example in Fig. 1(a), a bistable rocking disk has two equilibria (fixed-point attractors) of rest angles: left-tilt and right-tilt. Its initial angle and angular velocity determine the equilibrium resting angle where it will eventually settle down, thus comprising the system’s BoA. As shown in Fig. 1(b), the uniform sampling method divides the state domain into, e.g., a grid. Each grid point is represented by an initial condition, and its attractor label (left-tilt or right-tilt equilibrium) is determined by observing the simulated final angle where the disk eventually stops. Each of the 1600 samples is equally costly to label, but these samples make different contributions to estimating the BoA. Unlike a general classification problem, BoA are always contiguous, so if the number of attractors is known, it is sufficient to identify the boundaries between the basins; there is no need to check inside a basin for a region corresponding to a different attractor. For example, if there are two basins, there will be a single continuous boundary that defines the basins. However, in Fig. 1(c), only the 330 highlighted samples () were used to determine the boundary. In other words, uniform sampling wasted of its workload for unimportant samples. In order to address this inefficient sampling problem, this section proposes a novel sampling method which consists of three primary parts: additional sampling on the trajectories (AST), active learning (AL) and density-based sampling (DBS).

### 2.1 Additional Sampling on Trajectories (AST) – Get More Samples for “Free”

Unlike general classification problems where samples are independent, BoA estimation is able to take advantage of time series trajectories. On a trajectory, every state converges to the same attractor and thus shares the same label. In other words, when the label of an initial condition is determined by one numerical simulation or experiment, the generated time series trajectory can be sub-sampled to obtain additional labeled samples. For our method, the trajectory is sub-sampled by choosing samples that are a fixed distance apart in the state domain rather than the time domain in order to avoid gathering similar samples.

### 2.2 Active Learning (AL) – Select “Informative” Samples

Active learning (AL) is able to proactively select the unlabeled samples most likely to be “informative” and query an “agent” to obtain their labels. In the scenario of BoA estimation, the “informative” samples are the system’s states near the BoA boundary, and the “agent” provides labels by determining the attractors where the system will eventually converge. The agent could be a numerical simulator if the governing equation of the system is known, or a human experimenter if the BoA for a real-world physical system is needed. Since the BoA estimation is essentially a classification problem and the informative samples are the states near a BoA boundary, our AL algorithm is built upon a support vector machine (SVM) for its capability of providing a nonlinear classifier and easy interpretation of distances (also called “margins” in machine learning) from samples to the classifier’s decision boundary [3].

As shown in the Fig. 2, the margin-based AL needs to first generate a pool of unlabeled samples and randomly label a subset of them until different labels are observed. The following steps are then repeated until convergence: (1) fit a SVM classifier using the labeled samples; (2) label the unlabeled sample which has shortest distance (smallest “margin”) to the current decision boundary. In the Fig. 2, many of labeled samples are located near the real yet unknown classification boundary, which avoids wasted effort on labeling the less informative samples (e.g. the ones at the top-left and bottom-right corners).

When multiple samples have a similar distance from the boundary, the “length” of their trajectories can be used to break the tie, because longer trajectories usually provide more information. The “length” of a time series trajectory is defined as the total number of samples collected from the trajectory using a fixed sampling distance in the state domain.

For the case illustrated in Fig. 3(a)–(b) where the margin-based AL determines the two unlabeled samples (solid squares) are equally close to the decision boundary, the one in (a) generates a longer trajectory and more labeled samples, so it has higher priority to be labeled. Since choosing between possible samples requires estimating the lengths prior to generating the trajectories, a predictive model for the trajectory length given a state is needed. A Gaussian process (GP) model was selected since it can perform nonlinear regression, has controllable behavior when extrapolating, and needs no prior knowledge of the state distribution. Fig. 3(c) shows a GP model which predicts the length of trajectories based on the samples in (a) and (b). In our sampling method, the GP model is updated every time a new trajectory is generated.

### 2.3 Density-Based Sampling (DBS) – Select “Unfamiliar” Samples for Exploration

Similar to the most learning algorithms, our sampling method should also deal with the exploration/exploitation dilemma. A sampling method built upon AL alone tends to exploit only and get stuck in the local BoA boundary. In order to explore the entire state space and estimate the boundary globally, an auxiliary sampling approach based on the density of labeled samples is introduced. This density-based method prioritizes exploring the region in which the fewest samples have been labeled.

Inspired by the k-means++ algorithm which was proposed to spread out cluster centers [1], we defined as the distance from a state to the closest state we have already labeled. A larger usually indicates a lower density of labeled samples around the state , and for a labeled state. This density-based method selects the unlabeled sample with the largest and updates the distances in every sampling period after new labeled samples are collected.

### 2.4 Summary

Integrating the three sections above (AST + AL + DBS) leads to a hybrid active learning (HAL) sampling method for estimating BoA (see Tab. 1). Several hyper-parameters need be predetermined: 1) sampling distance on trajectories in AST; 2) threshold of the top shortest distances in AL; 3) SVM kernel and GP kernel; and 4) the method of selecting between AL and DBS in each episode (random or alternate selection). There are also several tunable hyper-parameters inside the kernel function, but they could be automatically optimized during fitting.

It’s worth noting that SVM was selected for its ability to quickly calculate distances from samples to the decision boundary in margin-based AL. However, once the sampling process is finished and calculating distances is no longer necessary, SVM might not be the best choice for the ultimate classifier, especially when the number of training samples is large. At the end of sampling process, the samples provide the necessary information to train other classifiers (such as k-nearest neighbors (KNN), random forests, neural networks, etc.) for better or faster estimation of the BoA.

1 | Generate a pool of unlabeled samples (“states of initial conditions”) |

2 | Randomly label a subset of unlabeled samples until different labels (“attractors”) are observed |

3 | Obtain additional labeled samples from their time-series trajectories (AST) |

4 | for episode = 1 : M do |

5 | Fit a support vector machine (SVM) classifier using the labeled samples |

6 | Fit a Gaussian process (GP) regressor using the length of trajectories generated |

7 | Randomly (or alternately) select one of the following sampling methods: |

8 | (1) Active Learning (AL): |

9 | Find the unlabeled samples within the top % shortest distance to the current SVM decision boundary |

10 | Among them select the one with the longest trajectory length predicted by the GP regressor |

11 | (2) Density-Based Sampling (DBS): |

12 | Evaluate each unlabeled sample’s distance to its closest labeled sample |

13 | Among them select the one with largest distance |

14 | Label the selected sample and obtain additional labeled samples from its time series trajectory (AST) |

15 | end for |

## 3 Result

A magnet-induced bistable system [9] was used to illustrate the performance of our sampling method. The governing equation of the nonlinear dynamical system is:

(1) |

where , , , , , . This bistable system has two stable equilibria (fixed-point attractors) and in the state domain of and . For the hyper-parameters in the HAL sampling method, 1) the sampling distance on trajectories in AST was set 0.07 after normalizing states to ; 2) the threshold of the top shortest distances in AL was set ; 3) radial-basis function kernels were used for both SVM and GP; and 4) the sampling method alternated between AL and DBS per sampling episode; and 5) the 2-dimensional state domain was uniformly divided into a grid, for 2500 unlabeled samples initially. As a result, Fig. 4(a) shows the 36 samples selected by HAL to give labels, and Fig. 4(b) shows the additional labeled samples obtained from the time series trajectories starting from these 36 samples. A 3-layer neural network (128 neurons – ReLU – 64 neurons – ReLU – 1 neuron – sigmoid) was trained to give the BoA in Fig. 4(c).

Given that the areas of basins are oftentimes imbalanced, “F1-score”, which considers both recall and precision of a classifier, was selected to evaluate the estimated BoA. Tab. 2 lists the minimum number of labels needed for different levels of F1-score for four sampling methods. The brute-force uniform sampling, as discussed in Sec. 2, wastes much effort on labeling the less informative samples that do not determine the BoA boundary, thus requiring the largest number of samples. Integrating uniform sampling with additional sampling from time series trajectories (uniform sampling + AST) gives little help for finding informative samples, but generates additional labeled samples every time one sample gets labeled, thus drastically reducing the sampling workload. The sampling method of AST + AL is capable of finding more informative samples near the boundary, which increases the sampling efficiency even more. However, as mentioned in Sec. 2.3, the sampling method built upon AL alone tends to get stuck in the local BoA boundary. It explains why this method behaves well for lower F1-scores, but leads to significantly low efficiency for higher F1-scores (our experiment stops after labeling 100 samples due to the unbearable computational time in AL). This problem was solved by introducing DBS, which explores global BoA boundaries. Combining all three sampling methods mentioned above (AST + AL + DBS) leads to our HAL sampling method, which shows a dominant advantage in the sampling efficiency in Tab. 2.

Sampling Method | F1-Score | |||

0.4 | 0.6 | 0.8 | 0.9 | |

uniform sampling | 225 | 441 | 576 | 1225 |

uniform sampling + AST | 9 | 16 | 25 | 64 |

AST + AL | 4 | 7 | >100 | >100 |

AST + AL + DBS (our HAL method) | 4 | 7 | 15 | 35 |

## 4 Conclusion

This paper introduces a hybrid active learning (HAL) sampling method for estimating a system’s basins of attraction (BoA). The proposed method provides sufficient samples to train a classifier representing the BoA. It consists of three primary parts: 1) additional sampling on trajectories (AST) to maximize the number of samples obtained from each simulation or experiment; 2) an active learning (AL) algorithm to exploit the local boundary of BoA; and 3) a density-based sampling (DBS) method to explore the global boundary of BoA. An example of estimating BoA for a bistable nonlinear system illustrates the high efficiency of this HAL sampling method.

Future work needs to extend our investigations in two key directions. First, the current HAL sampling method is based on binary classification, i.e. there are only two fixed-point attractors in the BoA. Although the method of “one–vs–rest” can be used for multi-label classification, this strategy leads to a workload linearly increasing with the number of attractors. A more efficient sampling method, which might be less affected by the number of attractors, is therefore needed. Second, this paper only illustrates the performance of the HAL sampling method on a 2-dimensional BoA with a smooth boundary; its robustness and efficiency for estimating the BoA with higher dimension and/or fractal boundaries is worth further investigation.

### References

- (2006) K-means++: the advantages of careful seeding. Technical report Stanford. Cited by: §2.3.
- (2009) A computational method for non-convex reachable sets using optimal control. In 2009 European Control Conference (ECC), pp. 97–102. Cited by: §1.
- (2007) Margin based active learning. In International Conference on Computational Learning Theory, pp. 35–50. Cited by: §2.2.
- (2004) Estimating the domain of attraction for uncertain polynomial systems. Automatica 40 (11), pp. 1981–1986. Cited by: §1.
- (1985) On the estimation of asymptotic stability regions: state of the art and new proposals. IEEE Transactions on automatic control 30 (8), pp. 747–755. Cited by: §1.
- (2002) Estimating domains of attraction of a class of nonlinear dynamical systems with lmi methods based on the theory of moments. In Proceedings of the 41st IEEE Conference on Decision and Control, 2002., Vol. 3, pp. 3150–3155. Cited by: §1.
- (2013) Convex computation of the region of attraction of polynomial control systems. IEEE Transactions on Automatic Control 59 (2), pp. 297–312. Cited by: §1.
- (2016) A fast sampling method for estimating the domain of attraction. Nonlinear dynamics 86 (2), pp. 823–834. Cited by: §1.
- (2020) Dynamics of a magnetically excited rotational system. In Nonlinear Structures and Systems, Volume 1, G. Kerschen, M. R. W. Brake and L. Renson (Eds.), Cham, pp. 99–102. External Links: ISBN 978-3-030-12391-8 Cited by: §3.
- (2018) Dynamics of unforced and vertically forced rocking elliptical and semi-elliptical disks. Journal of Sound and Vibration 417, pp. 341 – 358. External Links: ISSN 0022-460X, Document, Link Cited by: Figure 1.