A Successive-Elimination Approach to Adaptive Robotic Source Seeking

# A Successive-Elimination Approach to Adaptive Robotic Source Seeking

Esther Rolf*, David Fridovich-Keil*, Max Simchowitz,
Benjamin Recht, and Claire Tomlin
The authors are with the Department of Electrical Engineering and Computer Sciences, UC Berkeley, CA. {erolf, dfk, msimchow, brecht, tomlin}@berkeley.edu. indicates equal contribution.
###### Abstract

We study an adaptive source seeking problem, in which a mobile robot must identify the strongest emitter(s) of a signal in an environment with background emissions. Background signals may be highly heterogeneous and can mislead algorithms that are based on receding horizon control, greedy heuristics, or smooth background priors. We propose , a general algorithm for adaptive source seeking in the face of heterogeneous background noise. combines global trajectory planning with principled confidence intervals in order to concentrate measurements in promising regions while guaranteeing sufficient coverage of the entire area. Theoretical analysis shows that confers gains over a uniform sampling strategy when the distribution of background signals is highly variable. Simulation experiments demonstrate that when applied to the problem of radioactive source seeking, outperforms both uniform sampling and a receding time horizon information-maximization approach based on the current literature. We also demonstrate in hardware, providing further evidence of its potential for real-time implementation.

Source localization, active sensing, mobile robots, radioactive source seeking.

## I Introduction

Robotic source seeking is a problem domain in which a mobile robot must traverse an environment to locate the maximal emitters of a signal of interest, usually in the presence of background noise. Adaptive source seeking involves adaptive sensing and active information gathering, and encompasses several well-studied problems in robotics, including the rapid identification of accidental contamination leaks and radioactive sources [1, 2], and finding individuals in search and rescue missions [3]. We consider a specific motivating application of radioactive source-seeking (RSS), in which a UAV (Fig. 1) must identify the -largest radioactive emitters in a planar environment, where is a user-defined parameter. RSS is a particularly interesting instance of source seeking due to the challenges posed by the highly heterogeneous background noise [4].

A well-adopted methodology for approaching source seeking problems is information maximization (see Sec. II), in which measurements are collected in the most promising locations following a receding planning horizon. Information maximization is appealing because it favors measuring regions that are likely to contain the highest emitters and avoids wasting time elsewhere. However, when operating in real-time, computational constraints necessitate approximations such as limits on planning horizon and trajectory parameterization. These limitations scale with size of the search region and complexity of the sensor model and may cause the algorithm to be excessively greedy, spending extra travel time tracking down false leads.

To overcome these limitations, we introduce , a successive-elimination framework for general source seeking problems with multiple sources, and demonstrate it within the context of RSS. explicitly maintains confidence intervals over the emissions rate at each point in the environment. Using these confidence intervals, the algorithm identifies a set of candidate points likely to be among the top- emitters, and eliminates points that are not. Rather than iteratively planning for short, receding time horizons, repeats a fixed, globally-planned path, adjusting the robot’s speed in real-time to focus measurements on promising regions. This approach offers coverage of the full search space while affording an adaptive measurement allocation in the spirit of information maximization. By maintaining a single fixed, global path, reduces the online computational overhead, yielding an algorithm easily amenable to real-time implementation.

Specifically, our main contributions are:

• , a general framework for designing efficient sensing trajectories for robotic source seeking problems,

• Theoretical runtime analysis of as well as of a naive, uniform sampling baseline which follows the same fixed global path but moves at constant speed, and

• Simulation experiments for RSS evaluating in comparison with a uniform baseline and information maximization.

Our theoretical analysis sharply quantifies ’s improvement over its uniform sampling analog. Experiments validate this finding in practice, and also show that outperforms a custom implementation of information maximization tailored to the RSS problem. Together, these results suggest that the accuracy and efficient runtime of are robust to heterogeneous background noise, which stands in contrast to existing alternative methods. This robustness is particularly valuable in real-world applications where the exact distribution of background signals in the environment is likely unknown.

The remainder of this paper is organized as follows. Sec. II presents a brief survey of related literature. Sec. III provides a formal statement of the source seeking problem and introduces our solution, . In Sec. IV, we consider a radioactive source seeking (RSS) case study and develop two appropriate sensing models which allow us to apply to RSS. Sec. V analyzes the theoretical runtime complexity of and its uniform sampling analog for the RSS problem. In Sec. VI, we present simulation experiments which corroborate these theoretical results. A hardware demonstration provides further evidence of ’s potential for real-time application. Sec. VII suggests a number of extensions and generalizations to , and Sec. VIII concludes with a summary of our results.

## Ii Related Work

There is a breadth of existing work related to source seeking. Much of this literature, particularly when tailored to robotic applications, leverages some form of information maximization, often using a Gaussian process prior. However, our own work is inspired by approaches from the pure exploration multi-armed bandit literature, even though bandits are not typically used to model physical sensing problems with realistic motion constraints. We survey the most relevant work in both information maximization and multi-armed bandits below.

### Ii-a Information maximization methods

A popular approach to active sensing and source seeking in robotics, e.g. in active mapping [5] and target localization [6], is to choose trajectories that maximize a measure of information gain [7, 8, 9, 5, 10]. In the specific case of linear Gaussian measurements, Atanasov et al. [11] formulate the informative path planning problem as an optimal control problem that affords an offline solution. Similarly, Lim et al. [12] propose a recursive divide and conquer approach to active information gathering for discrete hypotheses, which is near-optimal in the noiseless case.

Planning for information maximization-based methods typically proceeds with a receding horizon [7, 13, 14, 15, 16]. For example, Ristic et al. [17] formulate information gathering as a partially observable Markov decision process and approximate a solution using a receding horizon. Marchant et al. [13] combine upper confidence bounds (UCBs) at potential source locations with a penalization term for travel distance to define a greedy acquisition function for Bayesian optimization. Their subsequent work [14] reasons at the path level to find longer, more informative trajectories. Noting the limitations of a greedy receding horizon approach, [18] incentivizes exploration by using a look-ahead step in planning. Though similar in spirit to these information seeking approaches, a key benefit of is that it is not greedy, but rather iterates over a global path.

Information maximization methods typically require a prior distribution on the underlying signals. Many active sensing approaches model this prior as drawn from a Gaussian process (GP) over an underlying space of possible functions [6, 7, 13], tacitly enforcing the assumption that the sensed signal is smooth [13]. In certain applications, this is well motivated by physical laws, e.g. diffusion [18]. However, GP priors may not reflect the sparse, heterogeneous emissions encountered in radiation detection and similar problem settings.

### Ii-B Multi-armed bandit methods

draws heavily on confidence-bound based algorithms from the pure exploration bandit literature [19, 20, 21]. In contrast to these works, our method explicitly incorporates a physical sensor model and allows for efficient measurement allocation despite the physical movement constraints inherent to mobile robotic sensing. Other works have studied spatial constraints in the online, “adversarial” reward setting [22, 23]. Baykal et al. [24] consider spatial constraints in a persistent surveillance problem, in which the objective is to observe as many events of interest as possible despite unknown, time-varying event statistics. Recently, Ma et al. [8] encode a notion of spatial hierarchy in designing informative trajectories, based on a multi-armed bandit formulation. While [8] and are similarly motivated, hierarchical planning can be inefficient for many sensing models, e.g. for short-range sensors, or signals that decay quickly with distance from the source.

Bandit algorithms are also studied from a Bayesian perspective, where a prior is placed over underlying rewards. For example, Srinivas et al. [25] provide an interpretation of the GP upper confidence bound (GP-UCB) algorithm in terms of information maximization. does not use such a prior, and is more similar to the lower and upper confidence bound (LUCB) algorithm [26], but opts for successive elimination over the more aggressive LUCB sampling strategy for measurement allocation.

A multi-armed bandit approach to active exploration in Markov decision processes (MDPs) with transition costs is studied in [27], which details trade-offs between policy mixing and learning environment parameters. This work highlights the potential difficulties of applying a multi-armed bandit approach while simultaneously learning robot policies. In contrast, we show that decoupling the use of active learning during the sampling decisions from a fixed global movement path confers efficiency gains under reasonable environmental models.

### Ii-C Other source seeking methods

Other notable extremum seeking methods include those that emulate gradient ascent in the physical domain [28, 29, 30], take into account specific environment signal characteristics [31], or are specialized for particular vehical dynamics [32]. Modeling emissions as a continuous field, gradient-based approaches estimate and follow the gradient of the measured signal toward local maxima [28, 29, 30]. One of the key drawbacks of gradient-based methods is their susceptibility to finding local, rather than global, extrema. Moreover, the error margin on the noise of gradient estimators for large-gain sensors measuring noisy signals can be prohibitively large [33], as is the case in RSS. Khodayi-mehr et al. [31] handle noisy measurements by combining domain, model, and parameter reduction methods to actively identify sources in steady state advection-diffusion transport system problems such as chemical plume tracing. Their approach combines optimizing an information theoretic quantity based on these approximations with path planning in a feedback loop, specifically incorporating the physics of advection-diffusion problems. In comparison, we consider planning under specific sensor models, and plan motion path and optimal measurement allocation separately.

## Iii AdaSearch Planning Strategy

### Iii-a Problem statement

We consider signals (e.g. radiation) which emanate from a finite set of environment points . Each point emits signals indexed by time with means , independent and identically distributed over time. Our aim is to correctly and exactly discern the set of the points in the environment that emit the maximal signals:

 S∗(k)=argmaxS′⊆S,|S′|=k∑x∈S′μ(x) (1)

for a prespecified integer . Throughout, we assume that the set of maximal emitters is unique.

In order to decide which points are maximal emitters, the robot takes sensor measurements along a fixed path in the robot’s configuration space. Measurements are determined by a known sensor model that describes the contribution of environment point to a sensor measurement collected from sensing configuration . We consider a linear sensing model in which the total observed measurement at time , , taken from sensing configuration , is the weighted sum of the contributions from all environment points:

 Yt(z)=∑x∈Sh(x,z)Xt(x) (2)

Note that while is known, the are unknown and must be estimated via the observations .

The path of sensing configurations, , should be as short as possible, while providing sufficient information about the entire environment. Moreover, we need to disambiguate between contributions from different environment points . We define the matrix that encodes the sensitivity of each sensing configuration to each point , so that . Disambiguation then translates to a rank constraint , enforcing invertibility of . Sections IV-A and IV-B define two specific sensitivity functions that we consider in the context of the RSS problem.

### Iii-B The AdaSearch algorithm

(Alg. 1) concentrates measurements in regions of uncertainty until we are confident about which points belong to . At each round , we maintain a set of environment points that we are confident are among the top-, and a set of candidate points about which we are still uncertain. As the robot traverses the environment, new sensor measurements allow us to update the lower and upper confidence bounds for the mean signal of each and prune the uncertainty set . The procedure for constructing these intervals from observations should ensure that for every , with high probability. Sections IV-A and IV-B detail the definition of these confidence intervals under different sensing models.

Using the updated confidence intervals, we expand the set and prune the set . We add to the top-set all points whose lower confidence bounds exceed the upper confidence bounds of all but points in ; formally,

 Stopi+1←Stopi∪{x∈Si | LCBi(x)>(k−|Stopi|+1)-th largest UCBi(x′),x′∈Si}. (3)

Next, the points added to are removed from , since we are now certain about them. Additionally, we remove all points in whose upper confidence bound is lower that than the lower confidence bounds of at least points in . The set is defined constructively as:

 Si+1←{x∈Si | x∉Stopi+1 and UCBi(x)≥(k−|Stopi+1|)-th largest LCBi(x′),x′∈Si}. (4)

### Iii-C Trajectory planning for AdaSearch

The update rules (3) and (4) only depend on confidence intervals for points . At each round, chooses a subset of the sensing configurations which are informative to disambiguating the points remaining in .

defines a trajectory by following the fixed path over all configurations, slowing down to spend time at informative configurations in , and spending minimal time at all other configurations in . Doubling the time spent at each in each round amortizes the time spent traversing the entire path . For omnidirectional sensors, a simple raster pattern (Fig. 2) suffices for and choosing is relatively straightforward (see Sec. IV-C).

We could also design a trajectory that visits the and minimizes total travel distance each round, e.g. by approximating a traveling salesman solution. In practice, this would improve upon the runtime of the fixed raster path suggested above. In this work, we use a raster pattern to emphasize the gains due to our main algorithmic components: global coverage and adaptive measurement allocation.

### Iii-D Correctness

Lemma III-D establishes that the two update rules above guarantee the overall correctness of , whenever the confidence intervals actually contain the correct mean : \thmt@toks\thmt@toks For each round , . Moreover, whenever the confidence intervals satisfy the coverage property:

 ∀j≤i,x∈Sj,μ(x)∈[LCBj(x),UCBj(x)], (5)

then . If (5) holds for all rounds , then terminates and correctly returns .

###### Proof:

(Sketch) Non-intersection of and follows inductively from update rule (4) and the initialization .

The overlapping set property follows by induction on the round number . When , . Now assume that holds for round . Update rule (3) moves a point from to only if its LCB is above the -th largest UCB of all points in . By (5), , so that must be greater than or equal to the -st largest means of the points in . Therefore, this must belong to , establishing that . Similarly, by update rule (4), a point is only removed from if its UCB is below the largest LCBs of points in , such that is less than or equal to at least other means. Thus, such a point cannot be in . This establishes that .

Finally, at termination we have , so that , so that . ∎

Lemma III-D provides a backbone upon which we construct a probabilistic correctness guarantee in Sec. V. If the event (5) holds over all rounds with some probability , then returns the correct set with the same probability .

## Iv Radioactive Source-Seeking with Poisson Emissions

While applies to a range of adaptive sensing problems, for concreteness we now refine our focus to the problem of radioactive source-seeking (RSS) with an omnidirectional sensor. The environment is defined by potential emitter locations which lie on the ground plane, i.e. , and sensing configurations encode spatial position, i.e. . Environment points emit gamma rays according to a Poisson process, i.e. . Here, corresponds to rate or intensity of emissions from point .

Thus, the number of gamma rays observed over a time interval of length from configuration has distribution

 Yt(z)∼Poisson(τ⋅∑x∈Sh(x,z)μ(x)) , (6)

where is specified by the sensing model. In the following sections, we introduce two sensing models: a pointwise sensing model amenable to theoretical analysis (Sec. IV-A), and a more physically realistic sensing model for experiments (Sec. IV-B).

In both settings, we develop appropriate confidence intervals for use in the algorithm. We introduce the specific path used for global trajectory planning in Sec. IV-C. Finally, we conclude with two benchmark algorithms to which we compare (Sec. IV-D).

### Iv-a Pointwise sensing model

First, we consider a simplified sensing model, where the set of sensing locations coincides with the set of all emitters, i.e. each corresponds to exactly one and vice versa. The sensitivity function is defined as .

Now we derive confidence intervals for Poisson counts observed according to this sensing model. Define to be the total number of gamma rays observed during the time interval of length spent at . The maximum likelihood estimator (MLE) of the emission rate for point is . Using standard bounds for Poisson tails [34], we introduce bounding functions and :

 U+(N,δ) :=2log(1/δ)+N+√2Nlog(1/δ)  and U−(N,δ) :=max{0,N−√2Nlog(1/δ)}

Then for any , , and ,

 Pr[U−(N,δ)≤λ≤U+(N,δ)]≥1−2δ.

Let denote the number of gammas rays observed from emitter during round , so that . For any point , the corresponding duration of measurement would be . The bounding functions above provide the desired confidence intervals for signals : . This bound implies that the inequality holds with probability . Dividing by , we see that and are valid confidence bounds for .

The term can be thought of as an “effective confidence” for each interval that we construct during round . In order to achieve the correctness in Lemma III-D with overall probability , we set the effective confidence at each round to be .

### Iv-B Physical sensing model

A more physically accurate sensing model for RSS reflects that the gamma ray count at each location is a sensitivity-weighted combination of the emissions from each environment point. Conservation of energy in free space allows us to approximate the sensitivity function with an inverse-square law , with a known, sensor-dependent constant. More sophisticated approximations are also possible [17].

Because multiple environment points contribute to the counts observed from any sensor position , the MLE for the emission rates at all is difficult to compute efficiently. However, we can approximate it in the limit: as . Thus, we may compute as the least squares solution:

 ^μ=argmin→μ∥~HT→μ−→Y∥22 , (7)

where is a vector representing the mean emissions from each , is a vector representing the observed number of counts at each of consecutive time intervals, and is a rescaled sensitivity matrix such that gives the measurement-adjusted sensitivity of the environment point to the sensor at the sensing position.111Specifically, we define . The rescaling term is a plug-in estimator for the variance of (with small bias introduced for numerical stability), which down weights higher variance measurements. The resulting confidence bounds are given by the standard Gaussian confidence bounds:

 [LCBi(xk),UCBi(xk)]:=^μ(xk)±α(δi)⋅Σ1/2kk , (8)

where , and controls the round-wise effective confidence widths in equation (8) as a function of the desired threshold probability of overall error, . We use a Kalman filter to solve the least squares problem (7) and compute the confidence intervals (8).

### Iv-C Design and planning for AdaSearch.

Pointwise sensing model. In the pointwise sensing model, and the most informative sensing locations at round are precisely . We therefore choose the path to be a simple space filling curve over a raster grid, which provides coverage of all of . We adopt a simple dynamical model of the quadrotor in which it can fly at up to a pre-specified top speed, and where acceleration and deceleration times are negligible. This model is suitable for large outdoor environments where travel times are dominated by movement at maximum speed. We denote this maximum speed as . Figure 2 shows an example environment with raster path overlaid (left) and trajectory followed during round with shown in teal (right).

Physical sensing model. Because the physical sensitivity follows an inverse-square law, the most informative measurements about are those taken at locations near to . We take measurements at points two meters above points on the ground plane. Flying at relatively low height improves the conditioning of the sensitivity matrix . We use the same design and planning strategy as in the pointwise model, following the raster pattern depicted in Fig. 2.

### Iv-D Baselines

We compare to two baselines: a uniform-sampling based algorithm , and a spatially-greedy information maximization algorithm .

NaiveSearch algorithm. As a non-adaptive baseline, we consider a uniform sampling scheme that follows the raster pattern in Fig. 2 at constant speed. This global trajectory results in measurements uniformly spread over the grid, and avoids redundant movements between sensing locations. The only difference between and is that flies at a constant speed, while varies its speed. Comparing to thus separates the advantages of ’s adaptive measurement allocation from the effects of its global trajectory heuristic. Theoretical analysis in Sec. V considers a slight variant in which the sampling time is doubled at each round. This doubling has theoretical benefits, but for all experiments we implement the more practical fixed-speed baseline.

InfoMax algorithm. As discussed in Sec. II, one of the most successful methods for active search in robotics is receding horizon informative path planning, e.g. [14, 15]. We implement , a version of this approach based on [14] and specifically adapted for RSS. Each planning invocation solves an information maximization problem over the space of trajectories mapping from time in the next seconds to a box .

We measure the information content of a candidate trajectory by accumulating the sensitivity-weighted variance at each grid point at evenly-spaced times along , i.e.

 ξ∗t=argmaxξN∑i=1|S|∑j=1Σjj⋅h(xj,ξ(t+Tplani/N)). (9)

This objective favors taking measurements sensitive to regions with high uncertainty. As a consequence of the Poisson emissions model, these regions will also generally have high expected intensity ; therefore we expect this algorithm to perform well for the RSS task. We parameterize trajectories as Bezier curves in , and use Bayesian optimization (see [35]) to solve (9). Empirically, we found that Bayesian optimization outperformed both naive random search and a finite difference gradient method. We set to 10 s and used second-order Bezier curves.

Stopping criteria and metrics. All three algorithms use the same stopping criterion, which is satisfied when the highest LCB exceeds the highest UCB. For emitter, this corresponds to the first round in which for some environment point . For sufficiently small probability of error , this ensures that the top- sources are almost always correctly identified by all algorithms.

## V Theoretical Runtime and Sampling Analysis

Separation of sample-based planning and a repeated global trajectory make particularly amenable to runtime and sample complexity analysis. We analyze and under the pointwise sensing model from Sec. IV-A. Runtime and sample guarantees are given in Theorem 2, with further analysis for a single source in Corollary 3 to complement experiments. Simulations (Sec. VI) show that our theoretical results are indeed predictive of the relative performance of and .

We analyze with the trajectory planning strategy outlined in Sec. IV-C. For , the robot spends time at each point in each round until termination, which is determined by the same confidence intervals and termination criterion for .

We will be concerned with the total runtime. Recall that is the time spent over any point when the robot is moving at maximum speed; is the time spent sampling canditate points at the slower speed of round .

 Trun (10)

where is the round at which the algorithm terminates. Bounds are stated in terms of divergences between emission rates :

 d(μ1,μ2)=(μ2−μ1)2/μ2 .

These divergences approximate the -divergence between distributions and , and hence the sample complexity of distinguishing between points emitting photons at rates . Analogous divergences are available for any exponential family, for example Gaussian distributions where the divergences are symmetric.

To achieve the termination criterion (when is determined with confidence ), all points with emission rate below the lowest in must be distinguished from , the lowest emission rate of points in . Therefore, for points , we consider divergences . Similarly, all points in must be distinguished from the highest background emitter corresponding to the divergences , describing how close is to the mean rate of the highest background emitter.

###### Theorem 2.

(Sample and Runtime Guarantees). Define the general adaptive and uniform sample complexity terms and :

 C(k)adapt :=|S|τ0+∑x∈S∗(k)1d(μ(k+1),μ(x))+∑x∈S∖S∗(k)1d(μ(x),μ(k)) C(k)unif :=|S|τ0+|S|1d(μ(k+1),μ(k)) (11)

for any integer number of sources and any distribution of emitters. For any , the following hold each with probability at least 222 notation suppresses doubly-logarithmic factors.
(i) correctly returns , with runtime at most

(ii) correctly returns with runtime bounded by

 Trun(NaiveSearch)≤C(k)unif⋅˜O(log(|S|/δtot)) .
###### Proof:

(Sketch) The runtimes (10) of each algorithm depend on how quickly we can reduce the set in each round. For each point , let denote the round at which removes from ; at this point we are confident as to whether or not is in , so we do not sample it on successive rounds. At round , we spend time sampling each point still in , so that we spend time sampling throughout the run of the algorithm. For , we sample all points in all rounds, so we spend time sampling.

Now we bound for each algorithm. These quantities depend on the estimated means . Using the concentration bounds that informed the bounding functions in Sec. IV-A, we can form deterministic bounds that depend only on the true means . We choose these to encompass the algorithm confidence intervals, so that: with high probability. If each of these inequalities holds with probability , then a union bound gives that the probability of failure of any inequality over all rounds is at most . By Lemma III-D, this ensures correctness with probability at least .

Because and are deterministic given and are contracting to nearly geometrically in , we can bound by inverting the intervals to find the smallest integer such that for all and . This requires an inversion lemma from the best arm identification literature (Eq. (110) in [36]). The specific forms of and yield the bounds on in terms of approximate KL divergences, which are added across all environment points to obtain the sample complexity terms for each algorithm in (2).

The form of results from noting that the function is decreasing in and increasing in for , and therefore

The term in the runtime bounds accounts for travel times of transitioning between measurement configurations. The second term accounts for the travel time of traversing the uninformative points in the global path at a high speed. This term is never larger than and is typically dominated by . With a uniform strategy, runtime scales with the largest value of over because that quantity alone determines the number of rounds required. In contrast, scales with the average of because it dynamically chooses which regions to sample more precisely.

Our sample complexity results qualitatively match standard bounds for active top- identification with sub-Gaussian rewards in the general multi-armed bandit setting (e.g.  [26]). The following corollary suggests that when the values of are heterogeneous, yields significant speedups over .

###### Corollary 3.

(Performance under Heterogeneous Background Noise). For a large environment with a single source with emission rate and background signals distributed as for , the ratio of the upper bounds on sample complexities of to scales with the ratio of to as

###### Proof:

To control the complexity of , note that

 Cunif=˜O(maxx≠x∗1/d(μ(x),μ∗))=˜O(μ∗/(μ∗−maxx≠x∗μ(x))2).

It is well known that that the maximum of uniform random variables on is approximately with probability , which implies that with probability at least . Hence, the sample complexity of scales as . On the other hand, the sample complexity of grows as

When are random and is large, the law of large numbers implies that this tends to . Therefore, the ratio of sample bounds of to is . ∎

## Vi Experiments

We compare the performance of with the baselines defined in Sec. IV-D in simulation for the RSS problem and validate in a hardware demonstration.

### Vi-a Simulation methodology

We evaluate , , and in simulation using the Robot Operating System (ROS) framework [37]. Environment points lie in a planar grid, spread evenly over a total area . Radioactive emissions are detected by a simulated sensor following the physical sensing model in Sec IV-B and constrained to fly above a minimum height of at all times.

For the first set of experiments (Figs. 35), we set , so that the set of sources is a single point in the environment. We set photons/s. In this setting, we investigate algorithm performance in the face of heterogeneous background signals by varying a maximum environment emission rate parameter . For each setting of , we test all three algorithms on grids randomly generated with background emission rates drawn uniformly at random from the interval .

We also examine the relative performance of all three algorithms as the number of sources increases (Fig. 6). For all experiments with , we randomly assign unique environment points from the grid as the point sources, with emissions rates set to span evenly the range photons/s. The signals of the remaining background emitters are drawn randomly as before, with . For all experiments, we set confidence parameter .

### Vi-B Results

Figure 3 shows performance across the three algorithms with respect to the following metrics: (a) total runtime (time from takeoff until is located with confidence), (b) absolute difference between the predicted and actual emission rate of , and (c) aggregate difference between predicted and actual emission rates for all environment points , measured in Euclidean norm. The uniform baseline terminates significantly earlier than , and terminates even earlier, on average. Of these runs, finished faster than in 21 runs, and finished faster than in 24.

To examine the variation in runtimes due to factors other than the environment instantiation, we also conducted runs of the same exact environment grid. Due to delays in timing and message passing in simulation (just like there would be in a physical system), measurements of the simulated emissions can still be thought of as random though the environment is fixed. Indeed, the variance in runtimes was comparable to the variance in runtimes in Fig. 3; over the trials of a fixed grid, the variance in runtimes were (), (), and (). Of these runs, finished faster than in 18 runs, and finished faster than in all 25.

Fig. 3(b) plots the absolute difference in the estimated emission rate and the true emission rate at the one source. and perform comparably over time, and terminates significantly earlier. Fig. 3(c) plots the Euclidean error between the estimated and the ground truth grids; in this metric the gaps in error between all three algorithms are smaller. is fast at locating the highest-mean sources without sacrificing performance in total environment mapping.

Fig. 4 shows performance of all three algorithms across different maximum background radiation thresholds . As increases, all algorithms take longer to terminate because the source is harder to distinguish from increasing heterogeneous background signals (left). For high background radiation values (e.g. ), the difference in runtimes between all three algorithms is larger; the runtime of increases gradually under high background signals, whereas and are greatly affected. Fig. 5 shows that as approaches , the relatives speedup of using adaptivity, , increases. This is consistent with the theoretical analysis in Sec. V; the dashed line plots a fit curve with rule .

Fig. 6 compares algorithm runtimes across different numbers of sources, . As suggested from Corollary 2, both absolute and relative performance is consistent across for all three algorithms.

is inherently a probabilistic algorithm, returning the true sources with probability , as a function of the number of rounds and the confidence with parameter, . Of the 175 trials run throughout these experiments, locates the correct source in 174 of them (). We set in our experiments to facilitate fair comparison of algorithms while maintaining reasonable runtime of the slower methods (, ). Given the speed with which returns a source, in practice it would be feasible to reduce , and hence reduce the probability of a mistake, . Due to the good performance of total grid mapping (Fig. 3(c)), even in the low-probability case that an incorrect source is returned, still provides valuable information about the environment.

### Vi-C Discussion

While all three methods eventually locate the correct source the vast majority of the time, the two algorithms with global planning heuristics, and , terminate considerably earlier than , which uses a greedy, receding horizon approach (Fig. 3). Moreover, the adaptive algorithm consistently terminates before its non-adaptive counterpart, . These trends hold over differing background noise threshold and number of sources, (Figs. 5 and 6).

The algorithm excels when it can quickly rule out points in early rounds. From (2) we recall that the sample complexity scales with the average value of (rather than the maximum, for ). Hence, will outperform when there are varying levels of background radiation.

As approaches and the gaps become more variable, adaptivity confers even greater advantages over uniform sampling. From corollary 3, we expect the ratio of