Weighting NTBEA for Game AI Optimisation
Abstract
The NTuple Bandit Evolutionary Algorithm (NTBEA) has proven very effective in optimising algorithm parameters in Game AI. A potential weakness is the use of a simple average of all component Tuples in the model. This study investigates a refinement to the NTuple model used in NTBEA by weighting these component Tuples by their level of information and specificity of match. We introduce weighting functions to the model to obtain WeightedNTBEA and test this on four benchmark functions and two game environments.
These tests show that vanilla NTBEA is the most reliable and performant of the algorithms tested. Furthermore we show that given an iteration budget it is better to execute several independent NTBEA runs, and use part of the budget to find the best recommendation from these runs.
\@footnotetext
Accepted as a paper at the 11th AI and Games Symposium at AISB 2020
1 Introduction
In Game AI, as in many other fields, algorithms usually have several parameters that need to be specified. For any given problem some parameter settings may give good results, while other settings give very poor results. For any new problem (a new game for example) we need to decide on which parameter values to use. In many cases a set of ‘standard’ parameter settings are available based on previous work, but these may not be ideal for the new domain. An exhaustive search of all possible parameter settings is usually unfeasible  it may take days of processing time on a large parallel cluster to train a complex neural network using Reinforcement Learning (RL). If the RL algorithm has four parameters, each of which can have five values, then training a policy under each possible setting will take = 625 clusterdays, or about 2 clusteryears to evaluate each. The problem is considerably worse if the outcome of any one evaluation (or experiment) is stochastic, so that a good estimate of the value of a given parameter setting requires many independent evaluations.
The field of parameter (and hyperparameter) optimisation seeks fast methods for deciding on parameter settings in a new domain with an available computational budget. This generally involves constructing a predictive model for the result of a future untried evaluation. After each timeconsuming realworld evaluation has been run, the computationally cheap predictive model is updated with the result and interrogated to suggest the next set of parameter values to try. By reducing the number of expensive full evaluations to find a good (if not necessarily optimal) set of parameters, we save significant time and money.
The NTuple Bandit Evolutionary Algorithm (NTBEA) was introduced in [8, 11]. It has been benchmarked against several other optimisation algorithms in stochastic game environments and proven to be more effective at finding a good set of parameter settings than other algorithms within a fixed computational budget [10]. Similarly [13] find NTBEA is the best optimiser of a number tried modify MCTS parameters during algorithm execution for a number of games.
NTBEA in [11, 10] estimates the value of a set of parameter values using the simple average of all matching Tuples in the model (see Background for a detailed explanation). The current work extends this to weight the matching Tuples using the amount of data (i.e number of realworld experiments) that inform a given Tuple, and the degree of specificity of the Tuple match. We hypothesise that this approach will allow us to converge to a good parameter setting faster and more robustly than vanilla NTBEA.
In addition to introducing WeightedNTBEA in this work, we also modify four benchmark tests from the function optimisation literature to incorporate noise. These enable optimisation algorithms to be compared cheaply (in terms of computational budget) and also provide greater confidence in conclusions because the true underlying value is known exactly, and is not an estimate over multiple expensive evaluations.
2 Background
2.1 Blackbox optimisation
Blackbox function optimisation addresses the problem of finding the optimal value of some
(1) 
where can be evaluated at any , but not differentiated. When is expensive to evaluate we wish to minimise the number of evaluations we make and can use the real evaluations made so far to model the result of (the ‘response surface’) to decide what value of should be evaluated next. A common approach is to use Bayesian optimisation techniques with a prior over the response surface, and update a posterior model after each evaluation. To pick the next point a tradeoff is made between exploitation and exploration; for example the point with the largest expected improvement (EI), or the highest 95% confidence bound (UCB) [12, 3, 7]. Bayesian methods require either a model to be specified, or a decision on the kernel functions to use in a (nonparametric) Gaussian Process. They are sensitive to stochastic noise, especially noise that is highly nonGaussian [3]. Approaches exist to integrate different types of noise into the model, but these add complexity to the model [12].
Most Bayesian methods and libraries assume that is continuous in all dimensions , and do not work in discrete spaces. This is not true for all, for example BOCS [2] uses Bayesian Linear Regression with semidefinite programming to optimise a discrete combinatorial problem. However, BOCS does assume uniform Gaussian noise. Other approaches have been used to model the response surface in blackbox optimisation: Random Forests are used in the SMAC algorithm [6].
In a banditbased approach, each setting of the parameters is one ‘arm’ of the bandit, and we seek to find out which ‘arm’ gives us the highest reward in a limited number of pulls. This is a natural fit if each can take a small number of discrete values, but it cannot cope easily with continuous dimensions.
2.2 Ntbea
This explanation of NTBEA closely follows [11]. During each iteration of NTBEA we:

Run a full game (or experiment, or other expensive function evaluation) using the current test setting . For the first iteration is selected at random.

Update the NTuple Model with the evaluation result.

Generate a neighbourhood of points by applying a mutation operator to (repeat times to get a neighbourhood of size ).

Evaluate the Upper Confidence Bound (UCB) for each of the N points using the NTuple Model. Select the one with the highest UCB as the new , and repeat from 1.
In this study, as in [10, 11, 8] we set =50, and the mutation operator used is to randomly mutate each to a random setting with probability , always mutating at least one .
NTuple Model
An 1Tuple model breaks down the modelled into components, where using Equation(2). Each component is the expected value of assuming that only affects the value. If , this is the mean of all evaluation results so far where . In (2), is a deltafunction that is 1 when a previously evaluated matches with the current setting in the th dimension, is the total number of such previous evaluations, and is the th of these.
(2) 
In other words, our prediction is the average of all the matching 1Tuple predictions based on past observations. There are no interactions between different parameters, and there are no assumptions about relationships between different values of a given parameter. For example, if one parameter has discrete values 1, 2 or 3 then the result of evaluations where this was 1 or 3 will have no impact at all on predictions for the intermediate 2. This is a very conservative nonparametric model. In the case of 5 dimensions with 10 possible values for each, we need to maintain just 50 sets of statistics for a 1Tuple model (, the number of times this tuplesetting has been tried, and , the mean of these evaluations). Any will match with exactly five of these, and is the mean of these five.
A 2Tuple model extends this to consider interactions between two parameter settings. We replace in (2) with , and now consider all evaluations that were a match on two different parameters. In the case of 5 dimensions with 10 possible values for each this gives a total of distinct 2Tuples for which and are maintained. Any will match with exactly .
Ucb
The UCB1 algorithm [1] calculates a probable upper bound on the true value of the ‘arm’ of a bandit , given the data observed so far using (3). is the total number of trials of the bandit, and is the number of times this ‘arm’ has been pulled (i.e. the number of times that has been evaluated).
(3) 
The NTuple model uses equation (2) to calculate , but we still have the second term of equation (3) that controls exploration. We can calculate this for each individual tuple, with equal to the total number of NTBEA iterations, and equal to the number of these for which the tuple matches ; i.e. in (2). NTBEA calculates the second term for each matching tuple, and then takes the arithmetic average. There is one additional nuance in that some tuples will never have been evaluated, and formally (3) will return in this case. To avoid this an additional hyperparameter is added, so that
(4) 
In this study, as in [10, 11, 8] we set . The value of needs to be scaled to the range of , and is set for each domain (see Method section).
3 Hypothesis
Vanilla NTBEA estimates the value of a parameter setting as the simple arithmetic mean of all the matching Tuples in the model that match. For example if we have five parameters and are using 1, 2 and NTuples then any will have one matching Tuple (where ), five matching 1Tuples and matching 2Tuples. The statistics gathered for each of these 16 Tuples is then averaged. The same approach applies to calculating the exploration estimate using (3). Even if we have evaluated a specific multiple times, the results from those evaluations still only comprise of the NTBEA estimate; always comes from the matching 1Tuples. Our hypothesis is that NTBEA will better estimate the value of a parameter setting if it applies greater weight to the more specific tuples as the number of evaluations increases. In the limit of a large number of evaluations of a specific , then only the statistics from the fullymatching Tuple should be relevant.
We propose four distinct weighting schemes, which vary in the rate of decay in the influence of lessspecific tuples. In all cases the value of a parameter setting , with different parameters is
(5) 
where is the average value from the NTuple statistics of and is the weight used for the NTuple statistics. The remaining 1 weight is applied to the average of all (N1)Tuples, i.e. all Tuples on the next level down. In (5), is a slight abuse of notation and refers to the number of such Tuples. In the case that no Tuples are held at the N1 level, then this descends to the next level for which we do have Tuples in the NTBEA model. Note that (5) is recursive, and each of the terms is calculated from weighting its Tuple statistics, , with a sum over at the next level down.
In our example vanilla NTBEA always weights the 5Tuple, the ten 2Tuples and the five 1Tuples at each. Using (5) this weighting will change as we gain more information. With no evaluations, the for any Tuple will be , and as the number of evaluations for a Tuple increases we want to increase towards a maximum of so that asymptotically we ignore information from lowerlevel Tuples.
The four weighting schemes use linear, inverse, inverse squareroot and exponential decay functions.

Linear
(6) 
Inverseroot
(7) 
Inverse
(8) 
Exponential
(9)
These functions are sketched in Figure 1. They have the desired properties that when (when no evaluations have been conducted that match the tuple), and as . They differ in the rate at which this decay happens, which in all cases must be parameterised by some . The Linear decay is most draconian, and will ignore any information from lower level tuples once , while under Inverseroot decay lowerlevel tuples have a residual weight of 0.71 after evaluations. For all experiments in this study we set . This is somewhat arbitrary, but scaled to be about 5% of the total iterations in the smallest experiments with a budget of about 300 NTBEA iterations.
Parameter  Planet Wars I  Asteroids I  Planet Wars II  Asteroids II 

Sequence Length  5, 10, 15, 20, 25, 30  5, 10, 15, 20, 50, 100, 150  7, 10, 13, 16, 20, 25, 30  50, 75, 100, 125, 150, 200 
Mutated Points  0, 1, 2, 3  0, 1, 2, 3  1, 2, 3, 5, 10, 15, 20  1, 2, 3, 5, 10, 20, 30, 50 
Resample  1, 2, 3  1, 2, 3  1, 2, 3  1, 2, 3 
Flip One Value  false, true  false, true  false, true  false, true 
Use Shift Buffer  false, true  false, true  false, true  false, true 
Mutation Transducer  false  false  false, true  false, true 
Repeat Prob.      0.2, 0.4, 0.6, 0.8  0.2, 0.4, 0.6, 0.8 
Discount Factor  1.0  1.0  1.0, 0.999, 0.99, 0.95, 0.9  1.0, 0.999, 0.99, 0.95, 0.9 
Parameter Space size  228  336  23,520  23,040 
4 Method
We apply each of the decay functions (6), (8), (7), 9) to a number of different optimisation problems to determine whether our hypothesis holds and the modified model does converge faster and more robustly than vanilla NTBEA. By using a number of different problems we seek to test that any improvement generalises, and is not specific to one domain. A secondary goal is exploratory, to see if the four different weighting functions have varying patterns of performance.
4.1 Benchmark functions
We test on four benchmark functions from the global optimisation literature [4, 7]. These are interesting nonconvex functions for which we can calculate the true value, and hence judge the performance for the NTBEA variants. Some amendments are needed to the original functions:

These are all deterministic functions with no noise. To convert them to a stochastic win/lose setting appropriate for a game benchmark we convert the function value to a probability of a +1 score (a ‘win’), and a 1 probability of a 1 score (a ‘loss’).

They are continuous functions in all dimensions. We discretise by taking values at equally spaced intervals for each dimension.

Global optimisation seeks to minimise a function. To maximise we multiply by 1.
We outline the four functions below. A complete description is in [4].

Hartmann3. A threedimensional function with four local optima. Two of these optima are close in value, with one slightly higher. In the original problem the output range is [0.0, 3.59], so we divide by 4.0 to get a value between 0 and 1. We split all three dimensions into ten equally spaced discrete values, for a total parameterspace size of 1000 with a true value .

Hartmann6. A sixdimensional function with a similar four optima to Hartmann3. We apply the same modifications as with Hartmann3, and discretize each dimension into five equally spaced values, for a parameterspace of size 15,625 with .

Branin. A twodimensional function with three global maxima at 0.4. We split each dimension into 20 equally spaced intervals to get a parameterspace of 400. We add 10 to the result, divide by 12 with a floor at 0 to get to a valid range for . In this case only 14.8% of the 400 points are nonzero.

GoldsteinPrice. A twodimensional function with one global maximum, and several local ones. We split each dimension into 20 equally spaced intervals to get a parameterspace of 400. We add 400 to the result, divide by 500 with a floor at 0 to get to a valid range for . 13.3% of the 400 points are nonzero.
In all cases we try each weighting function, plus vanilla NTBEA on each benchmark function with 300, 1000 and 3000 iterations. For each setting we run NTBEA 1000 times, and record the estimated value (by NTBEA) of the finally selected and the actual value. In NTBEA we use for the exploration constant in (4).
4.2 Game Parameters
Lucas et al. 2019 [10] compare NTBEA against several other popular optimisation algorithms in two games; Planet Wars and Asteroids. They optimise a Rolling Horizon Evolutionary Algorithm (RHEA) to find the best setting to win the 2player Planet Wars (+1 for a win, and 1 for a loss), and also to obtain the highest score in 2000 gameticks in the 1player Asteroids. For comparable results we use exactly the same games and settings. In Planet Wars we use for the exploration constant in (4), and for Asteroids.
In Planet Wars each player has a number of planets which generate ships at a constant rate. Players send ships from a planet to invade another, and to win the game they must conquer all planets. In Asteroids the player controls a ship which can rotate and shoot to destroy surrounding asteroids. Points are gained for shooting asteroids, and if one collides with the player then a life is lost; after three lost lives the game ends. The details of the gameplay are not central to this study, and more details can be found in [11, 10].
RHEA is optimised over five parameters in [10], which are listed in Table 1. Each optimisation algorithm was permitted 288 evaluations in Planet Wars, and 336 in Asteroids. This allowed Grid Search to run one game for each parameter setting. We repeat these experiments up to 100 times for each game and each weighting function. We record the parameter setting that is chosen each time. To get a good estimate of the actual value of the 288 and 336 possible settings it is feasible to run 1000 games for each setting of Planet Wars and 500 for Asteroids, although this takes 6 days to run for Asteroids, illustrating the value of a rapid optimiser.
These small parameter spaces of 228 and 336 have the advantage of permitting a good estimate of the ‘best’ setting to be found by brute force computation, but they are not representative of larger spaces in real problems. For example when optimising RHEA for a Game of Life variant [9] use NTBEA with 100 evaluations in a space of size 28,800. As a final experimental set we add further parameters to RHEA (discount factor, mutation transducer and repeat probability) from [9], and extend the other parameters to give a larger overall space as detailed in Table 1 in the ‘II’ columns. These extensions were fixed after seeing the results of the first set of experiments (the ‘I’ columns) to focus on areas with higher performance. For Planet Wars we increased the concentration of Sequence Length options around the optimal 1015 range, and in Asteroids we did the same around the optimal 100 value. We also increased the upper range of Mutated Points significantly, especially for Asteroids where the optimal value of 3 was the highest possible.
For these larger parameter spaces we used a budget of about 20,000 total iterations to try different overall approaches:

10 runs of 2,000 iterations each

3 runs of 7,000 iterations each

2 runs of 10,000 iterations each

1 run of 20,000 iterations
Given the size of the parameters spaces it was not feasible to estimate an accurate value for all parameter settings. Instead we do this (by running 1000 or 500 games for Planet Wars and Asteroids respectively) for just the settings suggested by any of these runs. The purpose of these experiments is to understand how best to spend an available budget of iterations. Should we use them in a single NTBEA run, or spread them out and then pick the best of the suggestions. This is motivated by an observation from Deep Reinforcement Learning research, in which the random seed can have a major effect on the outcome of the algorithm, and results are often reported using ‘best of N’ runs [5].
5 Results
5.1 Benchmark functions
Table S1 in the Supplementary Material tabulates the numeric means and confidence intervals for the NTBEA experiments on the four benchmark functions with added noise. Figure 2 displays boxplots of the true value of the NTBEA recommended parameters for each benchmark function and weighting function (1000 NTBEA runs for each, at 300, 1000 and 300 iterations).

Hartmann3. The appears to be the easiest of the four functions for NTBEA to optimise, with 300 iterations getting a mean value of 0.862 of a maximum of 0.897 for both Vanilla NTBEA (STD), and the Linear and Inverseroot weighting functions. With 3000 iterations all of the variants obtain a mean score of between 0.88 and 0.89; in all cases 25% to 35% of all runs recommend one of the three top parameter settings with actual values between 0.895 and 0.897

Hartmann6. This is harder to optimise with a clear progression as iterations increase from 300 to 3000. Vanilla NTBEA is a clear winner at only 300 iterations, and the Inverseroot and Inverse weighting functions are joint top with the Vanilla version at 3000 iterations (in a parameter space of size 15,625). The Linear weighting function does very poorly in comparison.

Branin. As with Hartmann6, Vanilla NTBEA is a clear winner at 300 iterations, and is joint top with the Inverseroot and Inverse weighting functions at 3000 iterations. The parameter space is only 400.

GoldsteinPrice. The same pattern is repeated here. Vanilla NTBEA is best for a small number of iterations, and all except the Linear weighting function are equally good with 3000 iterations to explore a parameter space of size 400.
The key finding is that here vanilla NTBEA (‘STD’ in Figure 2) is always the best or joint best for any combination of benchmark function and number of iterations, and is particularly effective for smaller numbers of iterations.
5.2 Games
NTBEA  Runs  Iterations  Game  Mean  S Dev  95% Interval  Delta  95% Interval  Top6  
STD  100  288  Planet Wars  0.655  0.079  0.640  0.671  0.185  0.203  0.167  60% 
LIN  100  288  Planet Wars  0.615  0.111  0.593  0.636  0.187  0.164  0.211  44% 
INV  100  288  Planet Wars  0.630  0.110  0.610  0.656  0.086  0.061  0.107  53% 
SQRT  100  288  Planet Wars  0.633  0.097  0.616  0.653  0.017  0.036  0.001  51% 
EXP  100  288  Planet Wars  0.643  0.091  0.625  0.663  0.130  0.107  0.151  58% 
STD  62  336  Asteroids  9596  67  9580  9613  709  741  680  94% 
LIN  66  336  Asteroids  9577  87  9556  9598  129  104  155  88% 
INV  68  336  Asteroids  9584  77  9567  9604  27  52  3  87% 
SQRT  69  336  Asteroids  9563  118  9536  9591  248  274  222  81% 
EXP  67  336  Asteroids  9570  82  9552  9590  104  80  125  87% 
Table 2 shows the results for Planet Wars I and Asteroids I experiments, with 228 and 336 NTBEA iterations on similarly sized parameter spaces. Figures 3 and 4 have box plots for the data. These are averaged over 100 runs for each setting for Planet Wars, and between 62 and 69 runs for Asteroids (the number that completed in an 84 hour window). For Planet Wars vanilla NTBEA gives both the best and most reliable (i.e. lowest standard deviation) results. The Exponential decay variant is the only one to have a performance within the 95% confidence interval of vanilla NTBEA. The single highest parameter setting gives a score of 0.732, with 6 of the 288 settings having a score of 0.65 or higher averaged over 1000 games. Since we have run 1000 games for each of the 288 settings and then picked the highest result, the 0.732 will be an overestimate. Apart from the Linear weighted variant, all algorithms pick one of the top 6 settings between 50% and 60% of the time.
For Asteroids the results are quite similar. Vanilla NTBEA gives the best result with the smallest standard deviation. One of the variants is within the 95% confidence interval, but in this case it is the Inverse weighting function. In both games is is clear, as in the Benchmark Function results, that vanilla NTBEA gives the best recommended parameter setting despite giving a very poor estimate of the absolute value that the recommendation will provide when used.
The 95% confidence intervals in Table 2 are calculated on the basis that the estimated values of each parameter setting are exact. This was true for the benchmark functions in Table S1, but is not true here due to noise in these estimates from averaging across 1000 or 500 independent games. We do not have an estimate of this additional uncertainty.
Encouragingly, we obtain exactly the same the optimal parameter settings for both games as those found in the original work (highlighted in Table 1) [10, 11]. However, we get rather higher values for these in game play. For Planet Wars the original work finds that 288 iterations of NTBEA achieves a score of , while we obtain . In Asteroids the relevant values are , against our . The reason for this discrepancy is not clear, but we do not believe it affects the key conclusions of this study.
Game  NTBEA  Iterations  Runs  Best score  Mean  SD  95% Bounds  
Planet Wars  STD  1000  20  0.772  0.707  0.045  0.688  0.727 
Planet Wars  LIN  1000  20  0.752  0.679  0.067  0.652  0.711 
Planet Wars  INV  1000  20  0.788  0.694  0.070  0.665  0.728 
Planet Wars  SQRT  1000  20  0.762  0.712  0.035  0.697  0.728 
Planet Wars  EXP  1000  20  0.774  0.681  0.061  0.656  0.708 
Planet Wars  STD  3000  7  0.762  0.709  
Planet Wars  LIN  3000  7  0.762  0.718  
Planet Wars  INV  3000  7  0.762  0.708  
Planet Wars  SQRT  3000  7  0.760  0.714  
Planet Wars  EXP  3000  7  0.774  0.735  
Planet Wars  STD  10000  2  0.756  0.717  
Planet Wars  LIN  10000  2  0.748  0.736  
Planet Wars  INV  10000  2  0.756  0.747  
Planet Wars  SQRT  10000  2  0.756  0.740  
Planet Wars  EXP  10000  2  0.770  0.748  
Planet Wars  STD  20000  1  0.708  
Planet Wars  LIN  20000  1  0.640  
Planet Wars  INV  20000  1  0.674  
Planet Wars  SQRT  20000  1  0.732  
Planet Wars  EXP  20000  1  0.632  
Asteroids  STD  1000  20  9815  9701  63  9675  9728 
Asteroids  LIN  1000  20  9776  9655  89  9617  9694 
Asteroids  INV  1000  20  9803  9706  70  9690  9722 
Asteroids  SQRT  1000  20  9811  9702  68  9676  9736 
Asteroids  EXP  1000  20  9819  9620  125  9569  9673 
Asteroids  STD  3000  7  9804  9707  
Asteroids  LIN  3000  7  9804  9764  
Asteroids  INV  3000  7  9835  9778  
Asteroids  SQRT  3000  7  9817  9736  
Asteroids  EXP  3000  7  9818  9758  
Asteroids  STD  10000  2  9705  9705  
Asteroids  LIN  10000  2  9709  9612  
Asteroids  INV  10000  2  9804  9801  
Asteroids  SQRT  10000  2  9817  9814  
Asteroids  EXP  10000  2  9779  9762  
Asteroids  STD  20000  1  9735  
Asteroids  LIN  20000  1  9783  
Asteroids  INV  20000  1  9783  
Asteroids  SQRT  20000  1  9815  
Asteroids  EXP  20000  1  9815 
Table 3 shows the results from the Planet Wars II and Asteroids II experiments with larger, more realistic parameter spaces to explore. There were 142 unique parameter settings recommended by the 150 NTBEA runs for the Planet Wars II experiments and an estimated value for each of these was calculated from averaging 1000 runs of the game. The best estimated scores of the recommended parameter settings have increased to 0.77 compared to the best possible score of 0.73 for Planet Wars I, so the additional parameters enable RHEA to better play the game if we can efficiently explore the space.
For Planet Wars vanilla NTBEA gives the best mean result at 1k iterations, and does not give significantly different results at more iterations (within 95% error bounds). The same caveat applies to these error bounds as in Table 2 as they do not include the additional uncertainty from the average over 1000 runs used to estimate the value of the final parameter settings.
The Inverseroot weighting functions matches vanilla NTBEA at 1k, and at 3k all variants at least match vanilla performance, with the Exponential weighting being the best. These results make clear that there is a high level of uncertainty in any individual NTBEA run. The best of the 20 vanilla runs at 1k gives a parameter setting that scores 0.772 over 1000 games, and the worst scores a mere 0.616. This remains true at 10k and 20k iterations, with three of the 20k runs recommending parameters that score less than 0.7.
Even with a large number of iterations any single NTBEA run may give a relatively poor result. Given a fixed budget of games to optimise a parameter Table 3 suggests that it is not a good idea to put the whole budget into a single NTBEA run. Far better to execute several NTBEA runs with a small number of iterations, and then use the remaining game budget to estimate the true value of each of these and pick the best.
This is reinforced when we look at the Asteroids results in Table 3. Vanilla NTBEA does joint best with 1k iterations, and the mean score does not increase significantly for higher numbers of iterations. At higher iterations all variants except the Linear function are at least as good, but not necessarily reliably better. In the Asteroids case there is an effective maximum score of 10000 when we use 2000 game ticks as here, so with all the mean and best results in the 9700 to 9800 range the optimisation does not have much room to work, especially when we add noise.
6 Discussion
In all four of the benchmark functions, and in both games across small and large parameter spaces vanilla NTBEA is at least as good as the weighting variants tried for small numbers of iterations; and usually better with lower variance in results. As the number of iterations increases this effect shrinks, and for some cases one of the weighting variants can be significantly better. For example Inverseroot with 1000 iterations on the Hartmann6 function, or the Exponential function with 3000 iterations in Asteroids II. However, this is cherrypicking. Furthermore the weighting variants introduce complexity with a new hyperparameter to be specified.
When we optimise an expensive function such as game performance over a parameter space we are deliberately trying to use a small number of iterations. Vanilla NTBEA works best in this situation, and we conclusively reject the hypothesis that improving the NTuple model with these weighting functions improves either reliability or performance.
We do not reject the hypothesis that the variants provide a better estimate of the true value of a parameter setting. Across all benchmark functions and game environments vanilla NTBEA provides very poor estimates of the actual value, underestimating by a very large margin because it is averaging over all possible Tuple matches. The Inverse and Inverseroot weighting functions consistently do a much better job of estimating the value of their recommendation. However, this is not as important when our key objective is to get a good recommendation; we can always go on to get a good estimate of its value later.
Linear weighting is clearly worse than the other options that do not exclude all contributions from lessspecific Tuples with more information after only iterations. The appears to be because once it has evaluations of a specific setting it ignores all other data, and uses the average of those evaluations. With a larger number of iterations what often happens is that sequential iterations focus on the current best estimate until the mean falls sufficiently and the focus shifts to another setting. With noisy function evaluations this often leads to a recommendation with a smaller number of trials (but more than ), that happens to currently have a high estimate. Hence the recommendation is optimistic because it picks the best (stochastic) estimate across all options with more than evaluations, and we can see this reflected in the general overestimate of the value of its recommendation (a version of the ‘winner’s curse’). This effect is less evident for the other weighting functions, as they never let the weighting of other Tuples fall to zero.
7 Conclusion and Future Work
We hypothesised that adding a recursive weighting function to apply to Tuples in NTBEA would improve performance in parameter optimisation in terms of quality and reliability of a recommended (optimised) parameter setting and in providing a more accurate estimate of the value of this. We tried four different weighting functions with different decay characteristics (linear, inverse, inverseroot and exponential) across four benchmark functions from the function optimisation literature, and two games with two distinct sizes of parameter space.
Across all ten experiments we found no evidence that the proposed weighting functions improved NTBEA except in the least important one of providing a better estimate of the true value of the parameter setting recommended by the optimising process. On the contrary, we found strong evidence that vanilla NTBEA is better able than the weighting function variants to reliably find a higher quality recommendation. This is especially true for the smaller number of iterations that would tend to be used in real world applications.
Finally we investigated how best to use a fixed budget of NTBEA iterations in the Planet Wars and Asteroids games. These showed than any individual NTBEA run may give a poor recommendation, and it is better to run several NTBEA runs with a smaller number of iterations, and then use the remaining budget to estimate more accurately the value of these, and then pick the best.
We have not explored different values of , the hyperparameter introduced to determine how the weighting function is used, and it is possible that other values may perform better. There are other more adventurous options to improve the NTuple model, such as regression across the tuples to determine which ones are important. The updated model in this paper still assumes that each Tuple at a given level is equally important. If we have no data for the full Tuple then we average across all matching 2Tuples, when in practise some of these may be more important than others. One approach to try would be to construct a regression model across the tuples to upweight the ones that better predict the observed results. We have also not changed the exploration model, which averages across all matching tuples as in vanilla NTBEA. It could be worthwhile to experiment with different noise models, for example using a square root instead of a log function in Equation (3), which has been found useful in other areas where exploration is more important than exploitation [14].
8 Acknowledgments
This work was funded by the EPSRC CDT in Intelligent Games and Game Intelligence (IGGI) EP/S022325/1.
NTBEA  Runs  Iteration  Function  Mean  SD  95% Bounds  Delta  95% Bounds  

STD  1000  300  Hartmann3  0.859  0.051  0.856  0.862  0.056  0.061  0.052 
LIN  1000  300  Hartmann3  0.862  0.104  0.855  0.868  0.080  0.074  0.087 
INV  1000  300  Hartmann3  0.855  0.057  0.852  0.859  0.075  0.084  0.066 
SQRT  1000  300  Hartmann3  0.848  0.066  0.844  0.853  0.235  0.241  0.230 
EXP  1000  300  Hartmann3  0.862  0.044  0.859  0.865  0.003  0.012  0.006 
STD  1000  1000  Hartmann3  0.881  0.016  0.880  0.882  0.075  0.078  0.072 
LIN  1000  1000  Hartmann3  0.872  0.032  0.870  0.874  0.087  0.084  0.091 
INV  1000  1000  Hartmann3  0.881  0.016  0.879  0.882  0.046  0.044  0.049 
SQRT  1000  1000  Hartmann3  0.883  0.013  0.882  0.883  0.007  0.004  0.010 
EXP  1000  1000  Hartmann3  0.879  0.018  0.878  0.880  0.067  0.065  0.070 
STD  1000  3000  Hartmann3  0.888  0.009  0.887  0.888  0.021  0.022  0.020 
LIN  1000  3000  Hartmann3  0.882  0.022  0.881  0.884  0.042  0.040  0.044 
INV  1000  3000  Hartmann3  0.886  0.010  0.885  0.886  0.031  0.029  0.032 
SQRT  1000  3000  Hartmann3  0.888  0.009  0.887  0.888  0.020  0.018  0.021 
EXP  1000  3000  Hartmann3  0.885  0.012  0.884  0.886  0.037  0.035  0.039 
STD  1000  300  Hartmann6  0.551  0.142  0.542  0.560  0.127  0.135  0.118 
LIN  1000  300  Hartmann6  0.535  0.142  0.526  0.544  0.119  0.107  0.130 
INV  1000  300  Hartmann6  0.516  0.149  0.506  0.525  0.026  0.015  0.038 
SQRT  1000  300  Hartmann6  0.461  0.155  0.451  0.471  0.120  0.133  0.108 
EXP  1000  300  Hartmann6  0.526  0.149  0.516  0.535  0.065  0.053  0.077 
STD  1000  1000  Hartmann6  0.633  0.095  0.627  0.639  0.178  0.188  0.168 
LIN  1000  1000  Hartmann6  0.633  0.092  0.628  0.639  0.108  0.100  0.115 
INV  1000  1000  Hartmann6  0.639  0.085  0.634  0.645  0.046  0.040  0.052 
SQRT  1000  1000  Hartmann6  0.610  0.101  0.604  0.617  0.031  0.041  0.020 
EXP  1000  1000  Hartmann6  0.633  0.092  0.627  0.639  0.082  0.076  0.088 
STD  1000  3000  Hartmann6  0.666  0.085  0.661  0.672  0.202  0.220  0.184 
LIN  1000  3000  Hartmann6  0.635  0.104  0.628  0.641  0.100  0.092  0.107 
INV  1000  3000  Hartmann6  0.666  0.066  0.661  0.670  0.031  0.027  0.036 
SQRT  1000  3000  Hartmann6  0.668  0.057  0.664  0.671  0.006  0.012  0.000 
EXP  1000  3000  Hartmann6  0.658  0.079  0.653  0.663  0.058  0.054  0.063 
STD  1000  300  Branin  0.705  0.098  0.699  0.712  0.020  0.027  0.013 
LIN  1000  300  Branin  0.676  0.142  0.667  0.685  0.389  0.398  0.380 
INV  1000  300  Branin  0.670  0.136  0.661  0.679  0.389  0.398  0.380 
SQRT  1000  300  Branin  0.460  1.327  0.376  0.544  0.204  0.288  0.120 
EXP  1000  300  Branin  0.678  0.126  0.670  0.686  0.397  0.405  0.388 
STD  1000  1000  Branin  0.773  0.027  0.771  0.775  0.031  0.035  0.028 
LIN  1000  1000  Branin  0.768  0.035  0.766  0.771  0.055  0.050  0.060 
INV  1000  1000  Branin  0.772  0.026  0.771  0.774  0.015  0.011  0.019 
SQRT  1000  1000  Branin  0.773  0.028  0.771  0.775  0.047  0.052  0.042 
EXP  1000  1000  Branin  0.773  0.029  0.771  0.775  0.041  0.037  0.044 
STD  1000  3000  Branin  0.789  0.012  0.788  0.790  0.020  0.021  0.018 
LIN  1000  3000  Branin  0.781  0.024  0.780  0.783  0.024  0.021  0.026 
INV  1000  3000  Branin  0.789  0.013  0.788  0.789  0.011  0.009  0.012 
SQRT  1000  3000  Branin  0.788  0.015  0.787  0.789  0.002  0.004  0.000 
EXP  1000  3000  Branin  0.784  0.020  0.783  0.785  0.019  0.017  0.022 
STD  1000  300  GoldsteinPrice  0.700  0.076  0.695  0.705  0.005  0.011  0.001 
LIN  1000  300  GoldsteinPrice  0.621  0.145  0.612  0.630  0.318  0.328  0.308 
INV  1000  300  GoldsteinPrice  0.622  0.142  0.613  0.631  0.323  0.333  0.313 
SQRT  1000  300  GoldsteinPrice  0.622  0.142  0.613  0.631  0.350  0.360  0.340 
EXP  1000  300  GoldsteinPrice  0.614  0.142  0.605  0.623  0.313  0.323  0.303 
STD  1000  1000  GoldsteinPrice  0.759  0.029  0.757  0.761  0.017  0.021  0.014 
LIN  1000  1000  GoldsteinPrice  0.755  0.035  0.753  0.758  0.071  0.066  0.075 
INV  1000  1000  GoldsteinPrice  0.756  0.029  0.754  0.758  0.024  0.020  0.028 
SQRT  1000  1000  GoldsteinPrice  0.750  0.030  0.748  0.752  0.042  0.046  0.037 
EXP  1000  1000  GoldsteinPrice  0.756  0.032  0.754  0.758  0.057  0.053  0.060 
STD  1000  3000  GoldsteinPrice  0.779  0.020  0.778  0.781  0.016  0.018  0.014 
LIN  1000  3000  GoldsteinPrice  0.775  0.026  0.774  0.777  0.024  0.021  0.026 
INV  1000  3000  GoldsteinPrice  0.780  0.020  0.778  0.781  0.014  0.012  0.016 
SQRT  1000  3000  GoldsteinPrice  0.778  0.021  0.777  0.779  0.001  0.003  0.001 
EXP  1000  3000  GoldsteinPrice  0.778  0.022  0.777  0.780  0.020  0.018  0.022 
References
 (2002) Finitetime analysis of the multiarmed bandit problem. Machine learning 47 (2â3), pp. 235â256. Cited by: §2.1, §2.2.2.
 (201807) Bayesian optimization of combinatorial structures. In International Conference on Machine Learning, pp. 462â471. External Links: Link Cited by: §2.1.
 (201012) A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599 [cs]. Note: arXiv: 1012.2599 External Links: Link Cited by: §2.1.
 (1978) The global optimization problem: an introduction. vol.2, 115. Amsterdam, Holland. Cited by: §4.1.
 (2018) Deep reinforcement learning that matters. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §4.2.
 (2011) Sequential modelbased optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization, pp. 507â523. Cited by: §2.1.
 (1998) Efficient global optimization of expensive blackbox functions. Journal of Global optimization 13 (4), pp. 455â492. Cited by: §2.1, §4.1.
 (2017) The ntuple bandit evolutionary algorithm for automatic game improvement. In 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 2201â2208. Cited by: §1, §2.1, §2.2.1, §2.2.2, §2.2.
 (2019) A local approach to forward model learning: results on the game of life game. In 2019 IEEE Conference on Games (CoG), Cited by: §4.2.
 (2019) Efficient evolutionary methods for game agent optimisation: modelbased is best. arXiv preprint arXiv:1901.00723. Cited by: §1, §1, §2.2.1, §2.2.2, §2.2, Table 1, §4.2, §4.2, §4.2, §5.2.
 (201802) The ntuple bandit evolutionary algorithm for game agent optimisation. arXiv:1802.05991 [cs]. Note: arXiv: 1802.05991 External Links: Link Cited by: §1, §1, §2.2.1, §2.2.2, §2.2, Table 1, §4.2, §5.2.
 (201601) Taking the human out of the loop: a review of bayesian optimization. Proceedings of the IEEE 104 (1), pp. 148â175. External Links: ISSN 00189219, 15582256, Document Cited by: §2.1.
 (2019) Comparing randomization strategies for searchcontrol parameters in montecarlo tree search. In 2019 IEEE Conference on Games (CoG), pp. 1â8. Cited by: §1.
 (2012) MCTS based on simple regret.. In AAAI, Cited by: §7.