The Apertif Monitor for Bursts Encountered in Real-time (AMBER) auto-tuning optimization with genetic algorithms

The Apertif Monitor for Bursts Encountered in Real-time (AMBER) auto-tuning optimization with genetic algorithms

K. Mikhailov K.Mikhailov@uva.nl A. Sclocco a.sclocco@esciencecenter.nl Anton Pannekoek Institute for Astronomy, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands ASTRON, the Netherlands Institute for Radio Astronomy, Postbus 2, 7990 AA, Dwingeloo, The Netherlands NLeSC, Netherlands eScience Center, Science Park 140, 1098 XG Amsterdam, The Netherlands
Abstract

Real-time searches for faint radio pulses from unknown radio transients are computationally challenging. Detections become further complicated due to continuously increasing technical capabilities of transient surveys: telescope sensitivity, searched area of the sky, number of antennas or dishes, temporal and frequency resolution. The new Apertif transient survey on the Westerbork telescope happens in real-time on GPUs by means of the single-pulse search pipeline AMBER (Sclocco, 2017). AMBER initially carries out auto-tuning: it finds the most optimal configuration of user-controlled parameters per each of four pipeline kernels so that each kernel performs its task as fast as possible. The pipeline uses a brute-force (BF) exhaustive search which in total takes 5 – 24 hours to run depending on the processing cluster architecture. We apply more heuristic, biologically driven genetic algorithms (GAs) to limit the exploration of the total parameter space, tune all four kernels together and reduce the tuning time to few hours. Our results show that after only few hours of tuning, GAs always find similar or even better configurations for all kernels together than the combination of single kernel configurations tuned by the BF approach. At the same time, by means of their genetic operators, GAs converge into better solutions than those obtained by pure random searches. The explored multi-dimensional parameter space is very complex and has multiple local optima as the evolution of randomly generated configurations does not always guarantee global solution.

keywords:
pulsars: general, stars: neutron, astronomical instrumentation, methods and techniques, algorithms: genetic algorithms
journal: Astronomy & Computing

1 Introduction

Various radio transient surveys constantly search for new pulsars, rotating radio transients (RRATs, McLaughlin et al., 2006), and fast radio bursts (FRBs, Lorimer et al., 2007; Petroff et al., 2016), especially at less explored extragalactic distances in dense environments. More such discoveries can help us better classify transients and study intergalactic medium (IGM). Even though distant radio transients are hard to localize, better localization can more comprehensively explore Galactic and extragalactic source populations in terms of stellar evolution and star formation that should depend on the type of host galaxy.

New discoveries of single bursts with Parkes, UTMOST, and ASKAP (Caleb et al., 2017; Bannister et al., 2017; Bhandari et al., 2018) and one repeating source of bursts with Arecibo (Spitler et al., 2016; Chatterjee et al., 2017) reveal new properties of radio bursts. Searches for much fainter and more distant bursts require more fine-grained searches and lead to new processing challenges (Magro et al., 2011; Barsdell et al., 2012; Sclocco et al., 2016). Just like standard pulsar searches, transient lookups are performed in the two-dimensional, time-frequency space for every unit of dispersion measure (DM, third dimension). Modern searches (see Table 1) are performed in real-time to trigger multi-frequency follow-up. They also require very high time sampling and frequency resolution to better determine the burst structure. Growing data rates and computational costs require larger supercomputers and search pipelines based on graphics processing units (GPUs) rather than central processing units (CPUs)111Other options, such as FPGAs and ASICs, are also available. However, FPGAs are very hard to program, and floating point performance is not comparable with GPUs, whereas ASIC are expensive to design and produce..

Parameter CHIME a UTMOST b SUPERB c ASKAP d ALERT e SKA-Low f SKA-Mid f
Status commissioning ongoing ongoing ongoing commissioning future future
FoV (deg) 220 9 0.6 30 8.7 27 0.49
N 1024 352 13 288 2600 500 1500
t (s) 2.5 655.36 64 40.92 50 50
n 2 1 4 2 2 4 4
(MHz) 600 835.5 1382 1400 1400 250 800
(MHz) 400 31.25 400 336 300 100 300
N 1024 320 1024 336 1536 8192 4096
(Jy) 45 28.5 60 1800 70
(SP, Jy) 0.25 0.9 0.3 13.4 1.6
  • Based on the CHIME system overview (The CHIME/FRB Collaboration et al., 2018)

  • Based on the UTMOST system overview (Bailes et al., 2017; Caleb et al., 2017)

  • Based on the SUPERB survey overview (Keane et al., 2018; Bhandari et al., 2018)

  • Based on ASKAP survey description (Bannister et al., 2017)

  • Based on Apertif Incoherent Search setup (Maan and van Leeuwen, 2017)

  • Based on the updated SKA review (Dewdney, 2013; Braun, 2015; Levin et al., 2017)

Table 1: Modern radio pulsar and transient surveys and their main characteristics. FoV is a survey field of view in square degrees, N is a number of facilitated beams, t is a sampling resolution in micro-seconds, is a number of polarizations, and are central frequency and available bandwidth, both in megahertz, and are system noise and minimum detectable flux density for a single-pulse threshold and pulse width, both in Janskys.

Transient surveys are also technically limited in their ability to detect new bursts. The number of antennas or dishes in the survey relates to the corresponding amount and size of beams they can produce. This determines how large the observing area of the sky would be. The system equivalent flux density , where is a total system temperature and is a system gain. Together with frequency bandwidth , number of polarizations , single-pulse threshold and single-pulse width, this determines down to what extent of radio transient brightness we can possibly search (radiometer equation, Lorimer and Kramer, 2004). Finally, the temporal and frequency resolution of the instrument set the limits to which the intrinsic structure of the pulse can be studied. Within all such limitations, the data should be optimally distributed on CPUs and GPUs for the signal processing: this includes de-dispersion (appropriate shift and integration of frequency channels that removes frequency dispersion), signal smoothing, and signal-to-noise evaluation. One way to find configurations that allow for fast data distribution and processing is to perform auto-tuning. In this case every configuration gets tested in terms of the best possible performance (fastest processing time in case of radio transient surveys). In the end, the most optimal configuration that allows the fastest search gets chosen. Auto-tuning is widely applied in computer science (Williams, 2008), but has also seen applications in other domains such as computational finance (Grauer-Gray et al., 2013) or astronomy (Sclocco et al., 2012). Auto-tuning for radio transient surveys also shows promising results in terms of performance portability (Sclocco et al., 2015).

The new real-time Apertif survey on the Westerbork (WSRT) telescope (ALERT, the Apertif Lofar Exploration of the Radio Transient Sky222http://alert.eu) is now equiped with a new 160GPU cluster that achieves 1.3 Pflops of peak performance and a data rate of 4 Tbit/s, and has 2 PBytes of available storage space (Maan and van Leeuwen, 2017). Such computational capacity enables deep searches up to time and frequency resolution, respectively. Apertif front-ends on 12 WSRT dishes produce more than 400 tied array beams that in total cover 8.7 deg of the sky, searched between 1100-1750 MHz with a tunable bandwidth of 300 MHz. Commissioning data from a targeted search toward FRB121102 already suggested a detection (Oostrum et al., 2017).

All hardware and software constraints require an optimized distribution of processing resources on the cluster to allow for the fastest real-time search: GPU threads and items, local memory re-use, loop transformations. Auto-tuning allows for an automated search of these parameters. Although such tuning is performed only once for a running survey, it should be invoked again in case the survey undergoes hardware changes (e.g. front-end or back-end upgrades) or the search pipeline itself gets extended or improved (e.g. by adding new processing steps). Besides, it should be easily portable to any other survey pipelines.

Section 2 introduces the current search pipeline for ALERT and its current auto-tuning. We introduce a more heuristic approach for auto-tuning with genetic algorithms in Section 3. Section 4 shows achieved performance based on different algorithm input parameters as well as comparison with the pure random search. We discuss auto-tuning parameter space in terms of complexity and degeneracy in Section 5 and draw our conclusions in Section 6.

2 AMBER auto-tuning

The real-time search for new single bursts on WSRT is performed via the single-pulse search pipeline AMBER (The Apertif Monitor for Bursts Encountered in Real-time333https://github.com/AA-ALERT/AMBER/, Sclocco, 2017). The pipeline can be divided into four main operations or kernels: a two-step de-dispersion444For a single DM, the frequency channels are first united into subbands such that the radio pulse signal first gets de-dispersed along subbands (step one, subband de-dispersion), and then within each subband (step two, intra-subband de-dispersion)., de-dispersed time series downsampling (smoothing) and subsequent signal-to-noise (S/N) computation. Before the search, each kernel of the pipeline gets tuned to find its most optimal processing configuration555https://github.com/AA-ALERT/AMBER_setup/tree/ARTS_tender.

Parameter Description Value for ARTS0
DEVICEPADDING Size of the cache line of OpenCL device (bytes) 128*
DEVICETHREADS Number of simultaneously running OpenCL work-items 32*
MINTHREADS Minimum number of OpenCL work-items 8
MAXTHREADS Maximum number of OpenCL work-items 1024*
MAXITEMS Maximum number of variables which the automated code is allowed to use 255*
LOCAL Use of OpenCL local memory "-local"
MAXITEMSDIM0 Maximum number of OpenCL work-items in time dimension 64
MAXITEMSDIM1 Maximum number of OpenCL work-items in DM dimension 32
MAXDIM0 Maximum number of OpenCL work-groups in time dimension 1024
MAXDIM1 Maximum number of OpenCL work-groups in DM dimension 128
MAXUNROLL Maximum loop unrolling 32
INPUTBITS Processing data rate (bits per sample) 8
SUBBANDS Number of frequency subbands 32
SUBBANDINGDMS Number of DM subbands 2048
SUBBANDINGDMFIRST Initial DM of the first subband (pc/cc) 0.0
SUBBANDINGDMSTEP Subband DM step (pc/cc) 2.4
DMS Number of DMs within each subband 24
DMFIRST Initial DM within the first subband (pc/cc) 0.0
DMSTEP DM step (SUBBANDINGDMSTEP / DMs) (pc/cc) 0.1
BEAMS Number of compound beams 1
SYNTHESIZEDBEAMS Number of synthesized beams 1
MINFREQ Minimum observing frequency (MHz) 1290
CHANNELS Number of frequency channels 1536
CHANNELBANDWIDTH Frequential resolution (MHz) 0.1953125
SAMPLES Number of samples 25600
BATCHES Number of samples per chunk of data 10
SAMPLINGTIME Time resolution (s) 0.00004096
DOWNSAMPLING Downsampling factors 10 3200
NRSAMPLES Downsampled number of samples SAMPLES / DOWNSAMPLING
  • NVIDIA GeForce GTX Titan X (Maxwell generation) characteristics

Table 2: Fixed survey parameters and their values for ARTS0 GPUs.

The parallel framework of choice for the accelerators is OpenCL, because it is vendor independent. In this regard GPU threads are referred as work-items, and GPU blocks of related threads are referred as work-groups. In AMBER, OpenCL kernels operate in three dimensional grids, but the pipeline uses only two dimensions, time and DM. These two dimensions limit the amount of available parallelism on both work-groups and work-items. The pipeline configuration is based on survey constraints and processing capabilities (see Table 2) as well as 8 different types of user-controlled parameters666Previous pipeline version additionally had one more parameter splitSeconds responsible for manipulation between different kernels. (see Table 3) that altogether define a single computational configuration for a specific many-core accelerator777Other input files contain downsampling factors, GPU cache line size, and frequency channels that need to be zapped due to terrestrial radio frequency interference (RFI) contamination..

Parameter Description Boundary conditions
localMem Utilization of local memory to allow for data re-use between computations (no re-use); (total re-use)
unroll Loop unrolling to optimize code execution by means of its reorganization ;
(step 1 de-dispersion), (step 2 de-dispersion)
nrSamplesPerThread № samples per work-item ;
(both steps of de-dispersion)
nrDMsPerThread № DMs per work-item ;
(step 1 de-dispersion), (step 2 de-dispersion)
is regulated by the maximum number of registers on a GPU card
nrSamplesPerBlock № samples per work-group ;
(both steps of de-dispersion)
nrDMsPerBlock № DMs per work-group ;
(step 1 de-dispersion), (step 2 de-dispersion)
is regulated by the maximum number of OpenCL work-items
nrItemsD0 № items to process per work-item ;
(time series downsampling); also (S/N calculation)
nrThreadsD0 № work-items in a work-group ;
(time series downsampling); also (S/N calculation)
Table 3: Types of user-controlled tuning parameters and their boundary conditions.

All user-controlled tuning parameters need to be generated within the corresponding boundary conditions. For de-dispersion kernels, AMBER may or may not utilize local memory (localMem) and loop unrolling (unroll) to speed up the computations. The latter also scales with the number of channels distributed over the frequency subbands as this gives the amount of parallelism during the frequency channels summation. The number of time samples and DMs that each GPU has to correct for (nrSamplesPerThread, nrDMsPerThread, nrSamplesPerBlock, nrDMsPerBlock) are constrained by the overall number of time samples and DMs / subbanding DMs in the search space. Similarly, smoothing (S/N evaluation) kernels are computationally limited by the total (downsampled) number of time samples in the observations.

Before the real-time search, the pipeline undergoes brute-force (BF) auto-tuning to optimize each kernel by going through all possible configurations of tunable parameters from the single kernel parameter space. An average run of the BF tuning takes from 5 to 24 hours depending on the processing cluster architecture. At Westerbork, AMBER runs on the ARTS GPU Cluster (van Leeuwen, 2014). For our tests, we only used the initial node of that cluster, ARTS0, powered by NVIDIA GeForce GTX Titan X GPUs. On a single such GPU, the BF tuning takes about 10 hours. The runtime of the whole pipeline (all kernels) on the randomly generated test data888The data amount depends on the number of processed batches. In our experiments, we used 10 batches, each of 1.024 sec, so the observation time is 10.24 sec. with the tuned configuration lasts about  sec.

The complete sampling has several drawbacks:

  • The BF tuning does not tune all kernels at once as it takes too much time, even on a large parallel system (see Section 5);

  • As the BF tuning is applied to each kernel separately, it does not consider dependencies between kernels which may lead to a global optimal configuration for the whole pipeline.

In this paper we test the idea that more heuristic genetic algorithms can find a good enough global configuration for all kernels together in less time. Although the end configuration may not be the overall global optimum (see Section 5), it will still be nearly as good or even better than the combination of best configurations per kernel after BF tuning. Additionally, it will almost always be better than the best configuration after a pure random search. The idea and overview of the GA algorithm is given in the following section.

3 The genetic algorithm

The idea of genetic algorithms (GAs) dates back to Charles Darwin’s idea of biological evolution that only the fittest individuals should survive and produce more adapted offspring (Holland, 1975). Most applications for GAs are in search and optimization (Goldberg, 1989). There are also a number of GA applications in astronomy, from spectral analysis and cosmology to telescope scheduling (Charbonneau, 1995; Metcalfe et al., 2000; Mokiem et al., 2005; Liesenborgs et al., 2006), but also in searches for pulsars and gravitational waves (Lazio, 1997; Petiteau et al., 2013).

3.1 Main genetic operators

The working element of every GA is a chromosome or an individual – a set of tunable (usually binary) parameters known as genes. The idea of GA is to evolve chromosomes and improve their scores guided by a fitness function. A typical GA contains five main genetic operators: initialization, selection, crossover, mutation, and replacement (see Fig. 1).

Figure 1: Typical block diagram of a genetic algorithm. After initializing population of individuals, we evaluate fitness functions of each individual and check the stopping criteria. We either finish or make individual selection based on its fitness value. Next, we apply crossover and mutation to selected individuals, re-evaluate their fitness functions and check the stopping criteria again. Once we satisfy the criteria, we end up with the best individual from the evolved population. Otherwise we replace old individuals by new ones and continue evolution further.

The individuals are first initialized (usually at random) and acquire their respective fitness functions. Based on the fitness scores, a number of fittest individuals get selected for further evolution. The selection is typically done either via roulette-wheel scheme (based on cumulative probability of fitness functions) or tournament scheme (based on the fittest winner from a limited pool of individuals).

The pair of selected individuals then produces an offspring through mixing their genes among each other. Such operation is known as crossover. The most used option to cross genes is a one-point crossover where the genes before the crossing point remain the same, but the rest of the genes gets swapped with that from another individual. Depending on the type of genes and the task, other crossover options such as multi-point or uniform crossover are possible (see Umbarkar and Sheth, 2015, for a review).

Next, offspring individuals may also undergo mutation when one or more genes happen to randomly change their values (e.g., Soni, 2014). This is done primarily to avoid premature convergence into a local optimum and better explore parameter space.

Finally, the new population replaces the old one with possible preservation of fittest parent individuals. After that the evolution starts again until the population obtains sufficient fitness function or the evolution reaches its limit in time or in the number of generations. The size of the individuals population as well as the rates of crossover and mutation are tunable parameters of the GA algorithm.

3.2 GA auto-tuning

The main advantages of the GA approach compared to BF tuning or random search during AMBER auto-tuning optimization are:

  • The BF algorithm tunes each kernel separately, whereas GA tunes the whole pipeline and thus considers interactions between the kernels. As a result, GA can find a better solution for all kernels together in less time.

  • Unlike random search, GA does a guided search for better parameters while still trying to explore the rest of the parameter space by means of mutation operator.

In our GA implementation999The code is available on GitHub: https://github.com/MixKlim/GA_AMBER, every individual is a set of free parameters (see Table 3) that altogether control all four kernels of the pipeline. Only localMem gene has a binary representation and switches between ‘0’ and ‘1’, other genes are powers of two for simplicity, and to preserve bit alignment on GPUs.

We first generate arrays of possible values for every gene based on boundary conditions and dependencies (see Section 2). Next, all four kernels of individuals obtain genes initialized with random values from those arrays. After that we evaluate individuals fitness functions as the run times of the whole pipeline, which includes a combination of all four kernels. We then apply tournament selection by creating a pool of randomly selected individuals (some fraction of the total population ) such that the individual with the best fitness wins and gets chosen from such pool. We do selection times.

Figure 2: Two types of crossover that we test in our GA: a) one-point crossover; only the genes (coloured squares) after the red dashed line get exchanged among two individuals. The line crossing point is chosen at random; b) uniform crossover with a coin toss probability; the exchange of genes between two individuals happens only if the randomly generated probability associated with these genes exceeds 50%.

Next we group selected individuals in pairs for the subsequent crossover. We make sure there are minimal or no identical individuals in pairs as this leads to no new offspring. After that we apply crossover for each kernel with probability to create an offspring population of individuals with interchanged genes. The size of offspring population is the same as the size of their parents . We test two types of crossover: one-point crossover and uniform crossover with a coin toss probability (see Fig. 2). The gene exchange happens only if the offspring genes also satisfy required boundary conditions.

Some kernels of each offspring individual then also undergo mutation with probability ; one or multiple genes in these kernels get randomly changed from an array of its possible values. For kernels that include downsampling (smoothing and S/N kernels), a number of genes get randomly chosen for mutation, one per downsampling factor. We make sure the new gene value is different from the old one unless it is prohibited by the boundary conditions. After mutation all offspring individuals get their fitness functions re-evaluated.

In the end, we replace parent individuals by their offspring. We rank parent and offspring individuals by their fitness values and select the best performing half of each group. In this case we preserve the fittest parent individuals and have a new population of the same size for the next generation.

4 Performance results

Since we are interested in testing how much faster GA tuning finds a solution nearly as good as BF tuning, we run our genetic evolution for as long as it takes BF to explore and tune every single kernel, i.e.  hrs for ARTS0. To evaluate fitness values we run AMBER with each individual configuration on the uniformly distributed noise with injected single pulse signal and obtain a set of execution times . Although the total runtime also includes time spent on test data generation , we do not take that time into account while evaluating the individual’s fitness function. Nevertheless, some configurations can result in a very slow run of the pipeline or even its breakdown, mostly due to inappropriate memory allocation. To avoid such configurations, we limit the AMBER total runtime to 3 min. Such empirical time limit was chosen to cover half-minute fluctuations from test data generation and most typical pipeline executions, but also penalize inefficient runs. For the tournament selection, the size of the tournament pool was set to be of to avoid multiple selections of only several dominant individuals.

In our tests we used individual configurations to balance between slow fitness function evaluation and sufficient number of generations. Unless being tested, crossover and mutation were applied with and , somewhat generally accepted average rates in the literature (see Patil and Pawar, 2015, for a review). We also recorded configurations with the best fitness value after each generation. We then tracked the best fitness value in the population over all generations along with the computational time spent per generation.

(a) Evolution with different probability.
(b) Evolution with different probability.
(c) Evolution with different size.
(d) Different types of crossover & random search option.
Figure 3: Different tests on algorithm convergence with (3(a)) changing mutation rate; (3(b)) changing crossover rate; (3(c)) changing population size; (3(d)) changing type of crossover (single-point, toss-coin) and performing pure random search. The last two runs of the random search got overall bad or penalized fitness values that appear higher and are therefore not visible on the plot.

We did several 10 hr test runs based on different population size, probabilities of crossover and mutation, type of crossover operator, and pure random search. For each run and for every population generation, we plot the ratio between the best obtained execution time over the runtime found by BF tuning . The algorithm performance plots for every aforementioned parameter are shown in Fig. 3; each data point is based on a single test run, since the algorithm includes a random component. The gradual fitness improvement, however, does not change with different runs (in other words, the evolution does not diverge).

In case of changing mutation rate (Fig. 3(a)), more frequent mutation leads to a more complete exploration of the total parameter space. Still, even when individuals always happen to mutate during their evolution (, best exploration), it may take much longer time compared to BF runtime to significantly improve the best configuration in population, mostly due to high complexity of the parameter space we have to explore (see Section 5) and prolonged fitness evaluations.

In case of changing crossover rate (Fig. 3(b)), more frequent crossover generally leads to a better mixing between individuals and therefore more rapid population evolution. Depending on how good the best configuration gets after random initialization, higher crossover rates result in higher evolutionary slopes. Extremely high rates (, all individuals get updated and then mutated) do not necessarily lead to the best fitness at the end of GA evolution, but greatly reduce its value after initialization.

The more individuals we initialize, the more time we spend on generating arrays of possible values for each gene based on boundary conditions (see Section 23). This is illustrated in Fig. 3(c), where we test GA performance relative to the population size. As a result, GAs with larger populations undergo less iterations within and are thus less likely to converge into good solutions. On the other hand, larger populations can better sample total parameter space during initialization, which can still result in a good fit (see also appendix A).

Fig. 3(d) shows tests based on different types of crossover operator, single-point and toss-flip (see also Fig. 2). Although toss-flip crossover implies more gene exchanges than single-point crossover, this does not reflect on the overall convergence of the best fit. We also test pure random search option where we do not use genetic operators but randomly initialize new populations until we reach . In this case there is no interaction between individuals or evolution of their population. Again, initialization takes more time than evolution, therefore we end up with less amount of runs before we reach BF time limit. Also, we do not track best individual configurations during and thus can have penalties in fitness functions for certain random search runs. Nevertheless, we may sometimes get relatively good fits straight from the initialization but those are rare due to the complexity of the total parameter space (see Section 5).

All plots show that the GA evolution can be generally described by a rapid drop of fitness value at the early evolution stage (subject of initialization and selection) and its much slower improvement at the later evolution stage (subject of crossover and mutation). Thus, we can already find a reasonably good configuration of after hours of GA evolution. We also see that within GA almost always converges into a better solution for the whole pipeline than BF tuning for each kernel. Again, this is because GA fitness function is guided by the overall performance of the whole pipeline, whereas BF optimizes each kernel separately. None of the GA parameters drastically change the gradual fitness improvement and its proximity to the best BF solution. The later evolution of the search parameters happens very slow and is also quite independent of the population size. As the algorithm converges to one of the local, good configurations after initialization or during very first generations, it can still improve them later through crossover or even find the best, global configuration through mutation.

5 Discussion

The main downside of every fitting algorithm is that after finding a local, degenerate solution, it is very unlikely to improve and converge into a much better, global fit. Our GA performance tests show that the parameter space we are trying to fit is very complex and possesses many local configurations that give almost identically good but not necessarily the best performance. To check the variety of parameter configurations that GA converges to, we build up histograms for best individual’s genes evolved in populations of three different sizes: , , and (see appendix A). The diversity in explored parameter ranges as well as histogram shapes shows that we obtain multiple degenerate solutions at the end of each GA evolution. Therefore, it takes more time than for one single run of GA evolution to cover all good configurations and determine the best among them.

To test the algorithm convergence, we measure the coefficient of variation among different configurations at the end of every algorithm run; is the standard deviation of the given parameter and is its mean value in a set of end configurations. We then average over multiple algorithm runs. The gaps in parameter ranges caused by boundary conditions do not affect as both and get affected but balance each other in a ratio. Higher shows more diversity in individual genes, whereas the algorithm convergence requires low . Fig. 4 shows averaged coefficients of variation among 20 configurations after 30 GA and random search executions. We see that GA shows strong parameter convergence compared to memoryless random search, and thus results in smaller . Most diversity happens in smoothing and S/N kernels since with multiple downsampling factors, these kernels have more freedom to get their parameters changed. However, de-dispersion kernels represent the highest pipeline workload. As a result, both one-step and two-step de-dispersion contribute in a greater degree to the execution time101010 of based on an average runtime for configurations., and are thus most crucial for tuning. We do not treat the variation of the binary parameter localMem as it has a near-zero mean and is thus very sensitive to small diversities in various configurations.

Figure 4: Bar histogram of coefficients of variation among individual configurations averaged over 30 GA and random searches. GA search has overall smaller than random search as it utilizes genetic memory which gives parameter convergence. nrThreads and nrItems get high since they have more freedom to change due to multiple downsampling factors. The coefficient of binary localMem variation is not present as it approaches infinite values once there is only a small variation in different configurations.

To better see how well GA tunes the pipeline compared to BF search, we estimate the volume of the tuneable parameter space. Despite the fact we only have 8 different types of free parameters (Table 3), their number grows and thereby expands total parameter space as we consider multiple de-dispersion and donwsampling steps. Each de-dispersion kernel has six free parameters, whereas two other kernels, signal smoothing and S/N evaluation, have free parameters for every downsampling factor. Given downsampling factors for Apertif and taking additional S/N evaluation for a non-downsampled signal into account, we have in total free parameters. As we initialize individual genes from arrays of possible values determined by boundary conditions, we can estimate how many possible values each individual gene can have. Since user-controlled parameter ranges are independent between different pipeline kernels, we get up to possible configurations. Sampling that many configurations for the whole pipeline with a 3 min runtime limit would require years, impossible even with a large parallel system.

It is also hard to predict a global optimal configuration for a pipeline without knowing the landscape of such high-dimensional parameter space. Nevertheless, even though different genes from various best individuals do not resemble each other, it is the combination of all genes that determines individual fitness function, or the performance of the whole pipeline. Thus, if we are determined to get a reasonably good configuration in few hours, finding one local solution and evolving it is enough to reach a better overall performance than applying a much deeper and longer BF search for each kernel. Our tests show that for ARTS0 a combination of random search at the beginning and evolution later can in hours get just as good or even better configuration than what BF approach can obtain in hours.

This also raises the question whether a complete random draw of configurations from the complete parameter space (pure random search) would do just fine. The histogram of best fitness values based on GA evolution and complete random search is given in Fig. 5. We see that on average GA finds better solution than just a random search, although the latter can sometimes overtake due to “lucky shots”. However, as in every random process, there is no time certainty on how long we might have to wait before a reasonably good individual gets initialized. In GAs there is a constant fitness improvement that constraints the expected waiting time to hours for ARTS0 instead of . Furthermore, good random picks should in general be rare as the total parameter space is very large-scale and hard to fit without any evolution.

Figure 5: Histogram of best fitness values after 35 GA and random searches. Although random initialization may sometimes reach good fits, GAs generally evolve their individuals to much better overall performance. In general, it takes more time to initialize new individuals within boundary conditions rather than evolve already good solutions that are a priori within such conditions.

6 Conclusions

Real-time single pulse searches are computationally intensive. Multiple factors play key role in how deep and fast searches can be made: total field of view and sensitivity in the survey, memory bandwidth and data rate of the beamformer, computational performance and versatility of the backend. We need sophisticated pipelines to speed the data processing up.

Before the actual processing, the transient search pipeline AMBER finds the most optimal configuration of user-controlled parameters for every pipeline kernel so that each kernel can perform at its fastest. This gets achieved via brute-force exploration of every kernel parameter space and takes many hours of processing depending on a survey setup. Besides, this does not take any dependencies between kernels into account and therefore does not tune the pipeline as a whole. Such tuning strategy does not allow the pipeline to be quickly retuned in case the pipeline gets modified or upgraded.

Our search strategy based on genetic algorithms shows that with GAs we can always obtain a nearly as good or even better configuration for the whole pipeline in less amount of time, i.e. hours for ARTS0. The better the configuration gets obtained during the random initialization, the faster the GA converges into an already good fit. Apart from that, strong selection together with frequent crossovers and casual mutations will always handle badly initialized population and still lead to a good fit in the end. Such strategy can be easily ported to more sensitive pipelines and surveys.

Heuristic algorithms are a perfect tool to quickly obtain a local solution that can be nearly as good, or even better, than BF tuning of each kernel. For multidimensional parameter spaces in radio astronomy and other domains (bioinformatics, cryptography), heuristics is by far the easiest way to find reasonably good solution in a short period of time and with limited computational resources.

Acknowledgements

The development and commissioning of ARTS is carried out by a large team of engineers and astronomers, including active participation from the authors and PI (Joeri van Leeuwen). The research leading to these results received funding from the Netherlands Research School for Astronomy under grant NOVA4-ARTS (KM) and from the Netherlands eScience Center under grant AA-ALERT, 027.015.G09 (AS).

References

References

  • Bailes et al. (2017) Bailes, M., Jameson, A., Flynn, C., Bateman, T., Barr, E. D., Bhandari, S., Bunton, J. D., Caleb, M., Campbell-Wilson, D., Farah, W., Gaensler, B., Green, A. J., Hunstead, R. W., Jankowski, F., Keane, E. F., Krishnan, V. V., Murphy, T., O’Neill, M., Osłowski, S., Parthasarathy, A., Ravi, V., Rosado, P., Temby, D., Oct. 2017. The UTMOST: A Hybrid Digital Signal Processor Transforms the Molonglo Observatory Synthesis Telescope. PASA34, e045.
  • Bannister et al. (2017) Bannister, K. W., Shannon, R. M., Macquart, J.-P., Flynn, C., Edwards, P. G., O’Neill, M., Osłowski, S., Bailes, M., Zackay, B., Clarke, N., D’Addario, L. R., Dodson, R., Hall, P. J., Jameson, A., Jones, D., Navarro, R., Trinh, J. T., Allison, J., Anderson, C. S., Bell, M., Chippendale, A. P., Collier, J. D., Heald, G., Heywood, I., Hotan, A. W., Lee-Waddell, K., Madrid, J. P., Marvil, J., McConnell, D., Popping, A., Voronkov, M. A., Whiting, M. T., Allen, G. R., Bock, D. C.-J., Brodrick, D. P., Cooray, F., DeBoer, D. R., Diamond, P. J., Ekers, R., Gough, R. G., Hampson, G. A., Harvey-Smith, L., Hay, S. G., Hayman, D. B., Jackson, C. A., Johnston, S., Koribalski, B. S., McClure-Griffiths, N. M., Mirtschin, P., Ng, A., Norris, R. P., Pearce, S. E., Phillips, C. J., Roxby, D. N., Troup, E. R., Westmeier, T., May 2017. The Detection of an Extremely Bright Fast Radio Burst in a Phased Array Feed Survey. ApJ841, L12.
  • Barsdell et al. (2012) Barsdell, B. R., Bailes, M., Barnes, D. G., Fluke, C. J., May 2012. Accelerating incoherent dedispersion. MNRAS422, 379–392.
  • Bhandari et al. (2018) Bhandari, S., Keane, E. F., Barr, E. D., Jameson, A., Petroff, E., Johnston, S., Bailes, M., Bhat, N. D. R., Burgay, M., Burke-Spolaor, S., Caleb, M., Eatough, R. P., Flynn, C., Green, J. A., Jankowski, F., Kramer, M., Krishnan, V. V., Morello, V., Possenti, A., Stappers, B., Tiburzi, C., van Straten, W., Andreoni, I., Butterley, T., Chandra, P., Cooke, J., Corongiu, A., Coward, D. M., Dhillon, V. S., Dodson, R., Hardy, L. K., Howell, E. J., Jaroenjittichai, P., Klotz, A., Littlefair, S. P., Marsh, T. R., Mickaliger, M., Muxlow, T., Perrodin, D., Pritchard, T., Sawangwit, U., Terai, T., Tominaga, N., Torne, P., Totani, T., Trois, A., Turpin, D., Niino, Y., Wilson, R. W., Albert, A., André, M., Anghinolfi, M., Anton, G., Ardid, M., Aubert, J.-J., Avgitas, T., Baret, B., Barrios-Martí, J., Basa, S., Belhorma, B., Bertin, V., Biagi, S., Bormuth, R., Bourret, S., Bouwhuis, M. C., Brânzaş, H., Bruijn, R., Brunner, J., Busto, J., Capone, A., Caramete, L., Carr, J., Celli, S., Moursli, R. C. E., Chiarusi, T., Circella, M., Coelho, J. A. B., Coleiro, A., Coniglione, R., Costantini, H., Coyle, P., Creusot, A., Díaz, A. F., Deschamps, A., De Bonis, G., Distefano, C., Palma, I. D., Domi, A., Donzaud, C., Dornic, D., Drouhin, D., Eberl, T., Bojaddaini, I. E., Khayati, N. E., Elsässer, D., Enzenhöfer, A., Ettahiri, A., Fassi, F., Felis, I., Fusco, L. A., Gay, P., Giordano, V., Glotin, H., Gregoire, T., Gracia-Ruiz, R., Graf, K., Hallmann, S., van Haren, H., Heijboer, A. J., Hello, Y., Hernández-Rey, J. J., Hößl, J., Hofestädt, J., Hugon, C., Illuminati, G., James, C. W., de Jong, M., Jongen, M., Kadler, M., Kalekin, O., Katz, U., Kießling, D., Kouchner, A., Kreter, M., Kreykenbohm, I., Kulikovskiy, V., Lachaud, C., Lahmann, R., Lefèvre, D., Leonora, E., Loucatos, S., Marcelin, M., Margiotta, A., Marinelli, A., Martínez-Mora, J. A., Mele, R., Melis, K., Michael, T., Migliozzi, P., Moussa, A., Navas, S., Nezri, E., Organokov, M., Pǎvǎlaş, G. E., Pellegrino, C., Perrina, C., Piattelli, P., Popa, V., Pradier, T., Quinn, L., Racca, C., Riccobene, G., Sánchez-Losa, A., Saldaña, M., Salvadori, I., Samtleben, D. F. E., Sanguineti, M., Sapienza, P., Schüssler, F., Sieger, C., Spurio, M., Stolarczyk, T., Taiuti, M., Tayalati, Y., Trovato, A., Turpin, D., Tönnis, C., Vallage, B., Van Elewyck, V., Versari, F., Vivolo, D., Vizzocca, A., Wilms, J., Zornoza, J. D., Zúñiga, J., Apr. 2018. The SUrvey for Pulsars and Extragalactic Radio Bursts - II. New FRB discoveries and their follow-up. MNRAS475, 1427–1446.
  • Braun (2015) Braun, R., 2015. Ska1 level 0 science requirements. Document Number SKA-TEL.SCI-LVL-REQ Revision 2.
    URL https://www.skatelescope.org/wp-content/uploads/2014/03/SKA-TEL-SKO-0000007_SKA1_Level_0_Science_RequirementsRev02-part-1-signed.pdf
  • Caleb et al. (2017) Caleb, M., Flynn, C., Bailes, M., Barr, E. D., Bateman, T., Bhandari, S., Campbell-Wilson, D., Farah, W., Green, A. J., Hunstead, R. W., Jameson, A., Jankowski, F., Keane, E. F., Parthasarathy, A., Ravi, V., Rosado, P. A., van Straten, W., Venkatraman Krishnan, V., Jul. 2017. The first interferometric detections of fast radio bursts. MNRAS468, 3746–3756.
  • Charbonneau (1995) Charbonneau, P., Dec. 1995. Genetic Algorithms in Astronomy and Astrophysics. ApJS101, 309.
  • Chatterjee et al. (2017) Chatterjee, S., Law, C. J., Wharton, R. S., Burke-Spolaor, S., Hessels, J. W. T., Bower, G. C., Cordes, J. M., Tendulkar, S. P., Bassa, C. G., Demorest, P., Butler, B. J., Seymour, A., Scholz, P., Abruzzo, M. W., Bogdanov, S., Kaspi, V. M., Keimpema, A., Lazio, T. J. W., Marcote, B., McLaughlin, M. A., Paragi, Z., Ransom, S. M., Rupen, M., Spitler, L. G., van Langevelde, H. J., Jan. 2017. A direct localization of a fast radio burst and its host. Nature541, 58–61.
  • Dewdney (2013) Dewdney, P., 2013. Ska1 system baseline design. Document Number SKA-TEL.SKO-DD-001 Revision 1.
    URL https://www.skatelescope.org/wp-content/uploads/2013/08/SKA-TEL-SKO-DD-001-1_BaselineDesign1.pdf
  • Goldberg (1989) Goldberg, D. E., 1989. Genetic Algorithms in Search, Optimization and Machine Learning, 1st Edition. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
  • Grauer-Gray et al. (2013) Grauer-Gray, S., Killian, W., Searles, R., Cavazos, J., 2013. Accelerating financial applications on the gpu. In: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units. GPGPU-6. ACM, New York, NY, USA, pp. 127–136.
    URL http://doi.acm.org/10.1145/2458523.2458536
  • Holland (1975) Holland, J. H., 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, second edition, 1992.
  • Keane et al. (2018) Keane, E. F., Barr, E. D., Jameson, A., Morello, V., Caleb, M., Bhandari, S., Petroff, E., Possenti, A., Burgay, M., Tiburzi, C., Bailes, M., Bhat, N. D. R., Burke-Spolaor, S., Eatough, R. P., Flynn, C., Jankowski, F., Johnston, S., Kramer, M., Levin, L., Ng, C., van Straten, W., Krishnan, V. V., Jan. 2018. The SUrvey for Pulsars and Extragalactic Radio Bursts - I. Survey description and overview. MNRAS473, 116–135.
  • Lazio (1997) Lazio, T. J. W., Oct. 1997. Genetic Algorithms, Pulsar Planets, and Ionized Interstellar Microturbulence. Ph.D. thesis, CORNELL UNIVERSITY.
  • Levin et al. (2017) Levin, L., Armour, W., Baffa, C., Barr, E., Cooper, S., Eatough, R., Ensor, A., Giani, E., Karastergiou, A., Karuppusamy, R., Keith, M., Kramer, M., Lyon, R., Mackintosh, M., Mickaliger, M., van Nieuwpoort, R., Pearson, M., Prabu, T., Roy, J., Sinnen, O., Spitler, L., Spreeuw, H., Stappers, B. W., van Straten, W., Williams, C., Wang, H., Wiesner, K., Dec. 2017. Pulsar Searches with the SKA. ArXiv e-prints.
  • Liesenborgs et al. (2006) Liesenborgs, J., De Rijcke, S., Dejonghe, H., Apr. 2006. A genetic algorithm for the non-parametric inversion of strong lensing systems. MNRAS367, 1209–1216.
  • Lorimer et al. (2007) Lorimer, D. R., Bailes, M., McLaughlin, M. A., Narkevic, D. J., Crawford, F., Nov. 2007. A Bright Millisecond Radio Burst of Extragalactic Origin. Science 318, 777.
  • Lorimer and Kramer (2004) Lorimer, D. R., Kramer, M., Dec. 2004. Handbook of Pulsar Astronomy. Cambridge University Press.
  • Maan and van Leeuwen (2017) Maan, Y., van Leeuwen, J., Sep. 2017. Real-time searches for fast transients with Apertif and LOFAR. ArXiv e-prints.
  • Magro et al. (2011) Magro, A., Karastergiou, A., Salvini, S., Mort, B., Dulwich, F., Zarb Adami, K., Nov. 2011. Real-time, fast radio transient searches with GPU de-dispersion. MNRAS417, 2642–2650.
  • McLaughlin et al. (2006) McLaughlin, M. A., Lyne, A. G., Lorimer, D. R., Kramer, M., Faulkner, A. J., Manchester, R. N., Cordes, J. M., Camilo, F., Possenti, A., Stairs, I. H., Hobbs, G., D’Amico, N., Burgay, M., O’Brien, J. T., Feb. 2006. Transient radio bursts from rotating neutron stars. Nature439, 817–820.
  • Metcalfe et al. (2000) Metcalfe, T. S., Nather, R. E., Winget, D. E., Dec. 2000. Genetic-Algorithm-based Asteroseismological Analysis of the DBV White Dwarf GD 358. ApJ545, 974–981.
  • Mokiem et al. (2005) Mokiem, M. R., de Koter, A., Puls, J., Herrero, A., Najarro, F., Villamariz, M. R., Oct. 2005. Spectral analysis of early-type stars using a genetic algorithm based fitting method. A&A441, 711–733.
  • Oostrum et al. (2017) Oostrum, L. C., van Leeuwen, J., Attema, J., van Cappellen, W., Connor, L., Hut, B., Maan, Y., Oosterloo, T. A., Petroff, E., van der Schuur, D., Sclocco, A., Verheijen, M. A. W., Sep. 2017. Detection of a bright burst from FRB 121102 with Apertif at the Westerbork Synthesis Radio Telescope. The Astronomer’s Telegram 10693.
  • Patil and Pawar (2015) Patil, V., Pawar, D., 2015. The optimal crossover or mutation rates in genetic algorithm: a review. JET 5 (3), 38–41.
    URL http://www.cibtech.org/J-ENGINEERING-TECHNOLOGY/PUBLICATIONS/2015/VOL-5-NO-3/05-JET-006-PATIL-MUTATION.pdf
  • Petiteau et al. (2013) Petiteau, A., Babak, S., Sesana, A., de Araújo, M., Mar. 2013. Resolving multiple supermassive black hole binaries with pulsar timing arrays. II. Genetic algorithm implementation. Phys. Rev. D87 (6), 064036.
  • Petroff et al. (2016) Petroff, E., Barr, E. D., Jameson, A., Keane, E. F., Bailes, M., Kramer, M., Morello, V., Tabbara, D., van Straten, W., Sep. 2016. FRBCAT: The Fast Radio Burst Catalogue. PASA33, e045.
  • Sclocco (2017) Sclocco, A., 2017. Accelerating radio astronomy with auto-tuning.
  • Sclocco et al. (2015) Sclocco, A., Bal, H. E., v. Nieuwpoort, R. V., Aug 2015. Finding pulsars in real-time. In: 2015 IEEE 11th International Conference on e-Science. pp. 98–107.
  • Sclocco et al. (2016) Sclocco, A., van Leeuwen, J., Bal, H. E., van Nieuwpoort, R. V., Jan. 2016. Real-time dedispersion for fast radio transient surveys, using auto tuning on many-core accelerators. Astronomy and Computing 14, 1–7.
  • Sclocco et al. (2012) Sclocco, A., Varbanescu, A. L., Mol, J. D., van Nieuwpoort, R. V., May 2012. Radio astronomy beam forming on many-core architectures. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium. pp. 1105–1116.
  • Soni (2014) Soni, N., 2014. Study of various mutation operators in genetic algorithms. IJCSIT 5 (3), 4519–4521.
    URL https://www.semanticscholar.org/paper/Study-of-Various-Mutation-Operators-in-Genetic-Soni/31a4010f427f6c485c577504e08e895a34e054d2
  • Spitler et al. (2016) Spitler, L. G., Scholz, P., Hessels, J. W. T., Bogdanov, S., Brazier, A., Camilo, F., Chatterjee, S., Cordes, J. M., Crawford, F., Deneva, J., Ferdman, R. D., Freire, P. C. C., Kaspi, V. M., Lazarus, P., Lynch, R., Madsen, E. C., McLaughlin, M. A., Patel, C., Ransom, S. M., Seymour, A., Stairs, I. H., Stappers, B. W., van Leeuwen, J., Zhu, W. W., Mar. 2016. A repeating fast radio burst. Nature531, 202–205.
  • The CHIME/FRB Collaboration et al. (2018) The CHIME/FRB Collaboration, :, Amiri, M., Bandura, K., Berger, P., Bhardwaj, M., Boyce, M. M., Boyle, P. J., Brar, C., Burhanpurkar, M., Chawla, P., Chowdhury, J., Cliche, J. F., Cranmer, M. D., Cubranic, D., Deng, M., Denman, N., Dobbs, M., Fandino, M., Fonseca, E., Gaensler, B. M., Giri, U., Gilbert, A. J., Good, D. C., Guliani, S., Halpern, M., Hinshaw, G., Hofer, C., Josephy, A., Kaspi, V. M., Landecker, T. L., Lang, D., Liao, H., Masui, K. W., Mena-Parra, J., Naidu, A., Newburgh, L. B., Ng, C., Patel, C., Pen, U.-L., Pinsonneault-Marotte, T., Pleunis, Z., Rafiei Ravandi, M., Ransom, S. M., Renard, A., Scholz, P., Sigurdson, K., Siegel, S. R., Smith, K. M., Stairs, I. H., Tendulkar, S. P., Vanderlinde, K., Wiebe, D. V., Mar. 2018. The CHIME Fast Radio Burst Project: System Overview. ArXiv e-prints.
  • Umbarkar and Sheth (2015) Umbarkar, A. J., Sheth, P. D., Oct. 2015. Crossover operators in genetic algorithms: a review. ICTACT 6 (1).
    URL http://ictactjournals.in/ArticleDetails.aspx?id=2109
  • van Leeuwen (2014) van Leeuwen, J., 2014. ARTS – the Apertif Radio Transient System. In: Wozniak, P. R., Graham, M. J., Mahabal, A. A., Seaman, R. (Eds.), The Third Hot-wiring the Transient Universe Workshop. pp. 79–79.
  • Williams (2008) Williams, S. W., 2008. Auto-tuning performance on multicore computers. Ph.D. thesis, University of California, Berkeley, CA, USA, aAI3353349.

Appendix A Evolutionary histograms for the best individuals in populations of different sizes

Figures A.1A.2A.3 represent histograms for the best individual genes (user-controlled parameters) values during their evolution in population with A.1) 20 individuals; A.2) 30 individuals; A.3) 50 individuals. All tests were performed under , and single-point crossover. In this case we do not distinguish between different de-dispersion steps or downsampling factors. The diversity in explored parameter ranges and their quantities relates to the diversity of degenerate end solutions. Parameter histograms for larger populations are more scarce – as GAs with larger populations need more time to initialize individuals, less time is spent on evolution of the best individual and exploration of its better genes.

Figure A.1: Genetic evolution with individuals.
Figure A.2: Genetic evolution with individuals.
Figure A.3: Genetic evolution with individuals.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
361202
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description