MultiResolution 3D Mapping with Explicit Free Space Representation for Fast and Accurate Mobile Robot Motion Planning
Abstract
With the aim of bridging the gap between high quality reconstruction and mobile robot motion planning, we propose an efficient system that leverages the concept of adaptiveresolution volumetric mapping, which naturally integrates with the hierarchical decomposition of space in an octree data structure. Instead of a Truncated Signed Distance Function (TSDF), we adopt mapping of occupancy probabilities in logodds representation, which allows to represent both surfaces, as well as the entire free, i.e. observed space, as opposed to unobserved space. We introduce a method for choosing resolution on the fly in realtime by means of a multiscale maxmin pooling of the input depth image. The notion of explicit free space mapping paired with the spatial hierarchy in the data structure, as well as map resolution, allows for collision queries, as needed for robot motion planning, at unprecedented speed. We quantitatively evaluate mapping accuracy, memory, runtime performance, and planning performance showing improvements over the state of the art, particularly in cases requiring high resolution maps.
 ATE
 Absolute Trajectory Error
 ICP
 Iterative Closest Point
 SLAM
 Simultaneous Localisation and Mapping
 ESDF
 Euclidean Signed Distance Field
 TSDF
 Truncated Signed Distance Field
 RMSE
 Root Mean Squared Error
 CNN
 Convolutional Neural Network
 AR
 Augmented Reality
 VR
 Virtual Reality
 SFC
 Safe Flight Corridor
I Introduction
The past years have brought impressive advancements in the field of dense environment mapping, fueled by the advent of RGBD cameras and ever more powerful processors, including GPUs. Applications of (near)realtime 3D dense mapping systems are vast, ranging from digital twins and Augmented/Virtual Reality to mobile robotics. Recent technological advances are inciting the use of mobile robots for many exploration and monitoring tasks. Their growing versatility and autonomy offer safe, costeffective solutions in a wide range of applications, including aerial surveillance, infrastructure inspection, and search and rescue [1]. However, to fully exploit their potential, a key task is enabling lightweight platforms to operate autonomously in unknown, unstructured environments with limited onboard computational resources. In the realworld, map representations are thus required that can accommodate both highfidelity reconstructions of the environment as well as perform fast online planning.
There are several efficient mapping methods for motion planning that rely on 3D volumetric representations. A common strategy is to use probabilistic occupancy maps using octree structures for quick access [3, 4]. More recently, signeddistancebased frameworks [2, 5, 6] have become popular as they allow fast map query operations in online settings. The main drawback of previous approaches is that their computational performance degrades drastically with the discretisation of the environment, since they operate on map data in the same way regardless of the occupancy status and geometry of the underlying space. As a result, these methods are only suitable for scenarios requiring coarse reconstructions or navigating in areas with relatively large obstacles. In contrast, our work addresses the challenge of trading off mapping accuracy against computational efficiency for planning in online, onboard robotic applications that require both.
In this paper, we introduce a volumetric adaptiveresolution dense mapping framework that supports multiresolution queries and data integration using occupancy mapping [7, 8, 3]. Contrary to methods based on signed distance functions, we continuously maintain a high resolution 3D octree representation of observed occupied and free space in realtime.
Our key insight is to recognise the lack of a concise method for defining required resolution in the context of occupancy mapping, with few approaches tackling this problem by extending single resolution approaches with adhoc heuristics [4]. As a result, we designed a novel integration algorithm that selects resolution by constraining the induced sampling error in the observed occupancy, splitting an octree in a coarsetofine fashion until a desired accuracy is reached. Central to this idea is the introduction of a multiscale maxmin pooling of the input depth image which enables realtime operation by providing a conservative indication of measured depth variation in any given volume with only a few queries.
To further enhance performance, we retained the data structure of [6] which uses mipmapped voxel blocks, but carefully designed a new scale selection and data propagation scheme between levels. This allowed us to increase computational and memory efficiency without introducing reconstruction artefacts.
While our method is focused on RGBD mapping, it is flexible to different sensor modalities. This is shown by evaluating our system in a wide range of datasets, from synthetic to large scale LIDAR, showing significant improvements in reconstruction accuracy, runtime performance, and planning performance against stateoftheart approaches.
In summary, the main contributions of this paper are:

A dense volumetric multiresolution online mapping system that consistently represents free and occupied space to fine resolutions where needed; we will release our reference implementation opensource upon acceptance of this work.

A novel fast octreetoimage allocation and integration method, that seamlessly adapts the map to different scene scales and sensor modalities.

A comprehensive evaluation with respect to the stateoftheart revealing vast improvements in the tradeoffs of mapping accuracy, speed, memory consumption, tracking accuracy, and planning performance.
Ii Related Work
A large body of literature addresses the problem of obtaining a dense 3D representation of the world. In this section, we overview recent work, focusing on volumetric reconstruction methods suitable for realtime robotic applications running on constrained hardware. This leaves aside most batch integration methods, where a map is only available for planning at the end of the mapping run; note that most newer deep learning based methods fall in this category.
A major milestone in online RGBD 3D reconstruction systems is the KinectFusion algorithm of [9], which enables dense volumetric modelling in realtime at subcentimetre resolutions. However, it is limited to small, bounded environments, as the mapping is locked to a fixed volume with a predefined voxel resolution, and requires GPGPU processing to achieve realtime performance. To improve scalability, several extensions to the original algorithm have been proposed. One possibility is to use moving fixedsize sliding volumes [10, 11] to achieve mapping in a dynamically growing space. Another strategy is to exploit memoryefficient data structures, such as octreebased voxel grids [4, 12, 13] or hash tables [14, 15], for quicker spatial indexing. More recently, deep learning methods tackling this problem in an incremental fashion are also being introduced [16]. Despite this impressive progress, most research in 3D mapping has targeted the application of surface reconstruction, in which the objective is to produce a highquality mesh/pointcloud of a given scene. From a navigation perspective, the concept of observed free space is equally or more important than surface accuracy. Unfortunately most of these methods model free space only in the vicinity of surface boundaries, making them incapable of generally distinguishing between free and unvisited areas in initially unknown environments. Our work builds on the concept of explicit distinction between the two as necessary for robotic planning, exploration, and collision avoidance.
In the context of navigation, Euclidean Signed Distance Field (ESDF)based mapping methods are commonly used for motion planning tasks as they provide distance information for trajectory optimisation strategies. Recently, significant work has been done on incrementally building ESDF maps for planning in 3D using aerial robots, including the voxblox [2] and FIESTA [5] frameworks, which construct ESDF maps Truncated Signed Distance Field (TSDF) maps and occupancy maps, respectively. However, as these methods are designed for fast onboard collision checking, they rely on coarsely disretised environments with voxel grid resolutions on the order of cm magnitude. In contrast, our approach is also motivated by applications like closeup inspection [17], where more detailed scene reconstructions are required.
An alternative representation for planning is the occupancy map [7, 8]. In 3D, Octomap [3] is a popular framework that uses hierarchical octrees to track occupancy probabilities as sensor data is received. Similar to the original work of [7], it uses an inverse sensor model formulation that efficiently approximates the posterior using an additive logodds update equation, which resembles the TSDF update procedure [9] in its nature. However, while Octomap works decently with LIDAR at low resolutions, its performance degrades significantly as map resolution increases, as well as in the presence of noisier sensors. Interestingly, a very recent contribution shows improvements over Octomap in terms of memory and run time by adaptively downsampling the pointcloud and integrating free space at lower resolutions [18], highlighting the importance of this topic to this day. While this method uses a set of distance based rules to decide the integration resolution (similar to [4]), leading in the fastest setups to nonconservative assignments of free space, our work tackles this problem by rigorously assessing the probabilistic (inverse) measurement model, introducing a methodology for choosing a sampling resolution that ensures these errors do not happen, while also improving run time performance.
Closest to our work and with the aim of providing an unified framework for trajectory reconstruction and planning, [4] proposed an efficient pipeline with an octreebased implementation. Their approach is suitable for either TSDFbased or occupancy mapping based on the spline inverse sensor model of [19]. However, the use of multiresolution is very basic and multiple assumptions are made which limit the applicability to planning scenarios. Subsequent work [6] extended this system to handle data integration with varying levels of detail and rendering at multiple resolution scales using TSDF maps, but limited to surface reconstruction. Our method draws some inspiration from this approach in terms of data structure and propagation; however, our focus is on volumetric occupancybased representations for planning and space understanding, where the main goal is to have a probabilistic classification of all observed space between occupied, free, and unknown at high resolutions, while providing at the same time a high quality (surface) reconstruction of the environment.
Iii MultiResolution Occupancy Mapping
Our library takes as input depth information and poses, incrementally computing a map which can be queried at any time for path planning, meshing, and rendering (see Figure 2). We also provide an Iterative Closest Point (ICP) module that can be switched on to obtain a full Simultaneous Localisation and Mapping (SLAM) system in the spirit of [9], as shown in section IVD.
Our data structure is an extension of [4] and is composed of an octree where the last levels consist of densely allocated voxel blocks. Similarly to [6], each voxel block stores a pyramid representation of itself enabling fast occupancy updates at different resolutions. However, unlike [6], we only allocate data in each block down to a single dynamically changing integration scale . More importantly, we augment this method by also allowing data to be stored at node level when the voxel block resolution is not needed. This two tier memory structure leverages flexibility in choosing resolution at the higher (octree) levels with efficient access at lower (voxel block) level. A visualisation of this structure’s allocation can be seen in Figure 1 for a practical use case.
By design, relevant occupancy information for a given point is maintained only at the lowest allocated voxel which contains the point. Similar to [3], upper nonleaf nodes store a max pooling of the children occupancy, enabling fast conservative queries of any given size. In addition to the max occupancy, we keep a Boolean state accounting for the presence of unknown data in the children, this allows us to disambiguate a node being partially or fully unobserved. In our system this pooling, or data uppropagation, is computed in each integration step, making it available for online planning at any given time.
Iiia Notation
We denote dimensional vectors with lowercase, bold letters, e.g. . Also, we denote the coordinate frame in which vectors are expressed with left subscripts, e.g. . We employ a World frame and the Camera frame . Euclidean transformations from coordinate frame to are denoted as . We denote matrices with uppercase bold letters, e.g. .
IiiB Occupancy Map Fusion
Given a depth image and camera pose at time step , the probability that a 3D point in space is occupied is assumed to be a function of the distance to the camera and the corresponding depth measurement along the ray from the camera centre to . Given the depth measurement from the projection of point into the camera:
(1) 
from one depth image we would like to represent the occupancy probabilities in 3D space:
(2) 
which corresponds to the inverse sensor model, a function of the depth along the ray. For brevity, it is referred to as in the following descriptions. An alternative way of representing the occupancy probability is using logodds, which allows for Bayesian updates to be additive as:
(3)  
(4) 
In contrast to previous work, we do not accumulate the logodds in a single sum, but instead use a weighted mean , with weight , similar to how it is done in [9]. This allows reducing artefacts during the interpolation and propagation stage caused by neighboring voxels with significant differences in the number of observations. Moreover, by clamping the weight below a threshold , the influence of outliers and dynamic objects is mitigated. Thus the mean logodds is updated as:
(5)  
(6) 
while the accumulated logodds can be preserved:
(7) 
IiiC Inverse Sensor Model
For fast computations, our inverse sensor model is a piecewise linear function based on [19] and is illustrated in Fig. 3 (a) operating directly in logodds space. Similarly to [19], we use a model where measured surface position matches with a logodds value of zero.
We model depth uncertainty as a function of measured distance (i.e. along the corresponding ray), assuming it to be linearly or quadratically growing depending on the sensor type (see Fig. 3 (b)): quadratic for RGBD and linear for LIDAR or synthetic (perfect) depth cameras. We relax the assumption of quadratic relation made in [19] between the surface thickness with a linear model bounded by a minimum and maximum surface thickness and , to avoid overgrowing of distant objects.
IiiD Adaptiveresolution Volume Allocation
The data structure is interpreted as a nonuniform partition of a continuous occupancy 3D scalar field [4]. Consequently we do not assign any probabilistic meaning to the size of the voxel storing a particular value and instead aim at choosing a partition of the tree which can represent the underlying continuous function up to a certain accuracy.
Choosing this partition on the fly in a volumetric space is not a trivial task. Methods like Octomap [3], use a raycasting scheme to allocate densely the observed space, simplifying resolution in later stage called tree pruning. Newer methods like OFusion [4] and UFOMap [18] mitigate the need for dense allocation by using a set of heuristics to choose the needed resolution during the allocation step. The main problem with these approaches is illustrated in Figure 4. The fact that resolution is chosen as part of the raycasting and mainly as function of distance tends to ignore changes in occupancy that happen because of depth change between pixels, which often require high resolution allocation to be accurately captured , leading to parts of space being erroneously labeled as free. This is particularly important in occlusions caused by thin objects and frustum boundaries.
To solve this problem we take a radically different approach, discarding the raycasting and using a so called ‘map to camera’ allocation and updating process, thereby following an ‘as coarse as possible, as fine as required’ mapping scheme. Given a new depth image , we start from the root of the octree analysing each node in order to decide whether the variation of logodd inside the voxel volume is bounded bellow a given threshold :
(8) 
for every in the voxel. This criterion has been used for 3D data compression on an octree [20].
If it is met, we update the node at the given scale. Otherwise the node is split into its eight children, and the process is recursively repeated until (8) is fulfilled or voxel block level is reached.
In order to make this algorithm feasible in a realtime setting, an efficient method to evaluate a voxel (8) any given position and size has to be provided. We observe that, due to the particular structure of the inverse sensor model, the occupancy span inside a given voxel can be connected with the span of depth values inside the voxels’ projection into the depth image, taking the problem from a 3D query to a 2D one. This is illustrated in Fig. 5 for three different cases.
More importantly, to quickly query the depth variation in a particular image region we precompute a maxmin pooling image vector at the preprocessing stage. This pooling consists of scales which aggregate the information of the original depth image for different square sizes centred around each pixel in the image. The square areas for a single pixel at different scales are illustrated in Fig. 5 (c). Each pooling image pixel at scale summarises the minimum and maximum depth measurement , within , as well as a validity state and an image crossing state to handle image boundaries and invalid data. Once this structure is computed, a conservative evaluation of (8) can be made by simply computing a bounding box of the voxel in the image and querying the maxmin pooling at the level with the maximum square size that is still fully contained in the bounding box, reducing the amount of queries to no more than 4 in the majority of cases.
IiiE Multiresolution Probabilistic Occupancy Fusion
While data at node level may be scattered at different resolutions, each voxel block is evaluated at a common resolution scale , further improving performance, reducing aliasing effects and simplifying interpolation required by operations like raycasting and meshing. Similar to [6], measurements are integrated at a mipmapped scale in the voxel block based on the current distance of the block’s centre from the camera. Additionally we integrate blocks that are known to only contain frontiers from free to unknown space not finer than scale .
As the distance changes, the desired integration scale may change as well. In contrast to [6], we apply a scale change hysteresis requiring the camera to move a certain amount closer to or further away from the voxel block before changing the scale again and to avoid constant scale changes of blocks located at a scale boundary.
We adapt the propagation strategies previously applied in [6], to our occupancy map representation to keep the hierarchy consistent between scale changes. As in [6], we replace the parent with the mean of all observed children when uppropagating information to a coarser mipmapped scale (). However, when down propagating to a finer integration scale (), we allocate the data first and assign the parents value to all its children. In either case, we do not change scale immediately. Instead we wait for the block to be fully projected into the image plane multiple times and observed at the desired scale . During this time we update the data at both scales and . Once the scale changing condition is fulfilled we switch to the new integration scale, thus deleting the previous buffer. This logic reducing artefacts by smoothing values during initialisation and preventing blocks that are occluded or just about to exit the camera frustum from changing scale.
Once all depth information for an image is integrated into the map, the mean values of all updated blocks are uppropagated to the mipmapped scale within of each block to enable trilinear interpolation between neighbours of different scale [6]. Additionally, the maximum occupancy of all updated blocks and nodes is propagated through all levels up to the root of the tree to support fast hierarchical free space queries. Finally, pruning is applied to merge node children whose weighted logodd occupancy is close enough according to a lower threshold . This extra step simplifies the tree when the occupancies converge to similar values.
IiiF Multiresolution raycasting and Meshing modules
Fast raycasting is required to render the surface in realtime and/or to enable tracking as part of a full dense SLAM system. With the uppropagated maximum occupancy and observed state stored in our data structure, we implemented a multiresolution raycasting strategy to quickly move a ray through large volumes of free space [20].
A meshing module is also provided which computes an adaptive resolution dual mesh following [21]. This approach had been adapted to support our two tier data structure.
Iv Evaluation
This section reports on our experimental results. All our tests were performed on a laptop with an Intel Core i78750H CPU operating at 2.20 GHz, 8 GB of memory and running Ubuntu 18.04. We used GCC 7.4.0 with OpenMP acceleration for software compilation. For computing the TSDF with voxblox [2] the fast method was used throughout the comparisons, both Octomap [3] and voxblox were run in ROS. The parameter values used in the experiments are shown in Table I unless otherwise specified.
m  
m  
Cow and Lady  
FR3  Long Office  
ICL  LR2 
Three widely known datasets were used to quantitatively evaluate our system, namely the TUM RGBD [22], Cow and Lady [2] and ICLNUIM [23].
Iva Reconstruction Accuracy
To evaluate how our method performs in terms of surface reconstruction, we evaluate in the ICLNUIM Living Room 2 dataset [23]. Table II shows the reconstruction Root Mean Squared Error (RMSE) to the ground truth mesh as computed by the SurfReg tool provided by the dataset, with given poses and without extra ICP alignment. Our method outperforms previous approaches on both exercised resolutions.
Additionally we investigate the potential degradation induced by downsampling the input image. It can be seen the effect in the reconstruction metric in all cases is minor, motivating the use of this later step for improving both running time and memory consumption.
To show how our method is able to adaptively select sampling resolution, multiple reconstructions are presented in Figure 6 with color encoding scale.
Image scale  1 cm  2 cm  

Ours  0.0146  0.0193  
0.0250  0.0341  
SE OFusion  0.0166  0.0259  
0.0478  0.0465  
SE TSDF  0.0142  0.0200  
0.0429  0.0367 
IvB Runtime Performance
Running times for our system and competing methods can be seen in Fig. 8. Our method achieves the highest relative performance at 1cm resolution for both image configurations, beating all other methods in datasets using real sensors. While at 2cm most methods fall in the same range, as resolution decreases to 8cm the relative cost of computing the pooling image becomes significant, making supereight OFusion significantly faster. This is expected as the main goal of the pooling is to boost performance in high resolution cases were the algorithm is targeted to. It should also be noted that the speed achieved by OFusion is in many cases driven by nonconservative assumptions that render the output unsuitable for navigation, as shown in section IVE.
Other approaches like voxblox present reasonable results, but have the drawback of requiring an extra step of computing an ESDF in order to allow planning on them, which becomes prohibitively expensive in high resolution cases. This limits its usability in cases requiring a high resolution navigable map in realtime. Finally OctoMap presents significantly higher integration times than newer methods, since it is targeted at low resolution LIDAR made maps.
IvC Multiresolution, Memory and Efficiency
To assess memory efficiency of our multiresolution approach, we compare our memory usage to supereight OFusion [4] and supereight MutiresTSDF [6]. In particular OFusion uses an octree, similarly to our system for storing both free and occupied space, while MultiresTSDF also utilises an octree but for sparsely allocating a narrowband TSDF. As a conservative measurement of memory usage, we use the main process’ Resident Set Size (RSS) that has been reported to the Linux kernel to compute the RAM usage.
Results can be seen in Table III. For the intended use case of high resolution reconstruction and planning, our system overcomes competing methods in real life scenarios, mainly through gains made by the adaptive sampling resolution at voxel block level. As resolution lowers, or in small datasets like the ICL where the relative size of voxel block grows in a way that brings the allocation near to dense, the cost of storing this extra multiresolution information together with our conservative allocation of frustum boundaries, takes a toll on the usage with respect to simpler methods like OFusion. This extra cost is justified by superior planning performance.
Cow & Lady  FR3 Long Office  ICL LR2  

1cm  8cm  1cm  8cm  1cm  8cm  
Ours  1100  65  493  62  397  105 
SE OFusion  1734  57  1009  56  192  54 
SE MultiresTSDF  2922  65  1728  68  406  56 
IvD Tracking Performance
In this section we evaluate our mapping system integrated to a Dense SLAM Pipeline similar to [9], which is included as an additional component of the presented library.
Table IV shows trajectory accuracy results using the Absolute Trajectory Error (ATE) metric against TSDF and OFusion [4] methods on the FR1  Desk and FR3  Long Office sequences of the TUM RGBD datasets. The same ICP tracking approach presented in [9] is used for all pipelines. For pointcloud extraction our system relies on the efficient multiresolution raycasting method described in section IIIF. Results indicate that maps produced by our approach are suitable for accurate ICP tracking, with the ATE being similar to that of previous methods.
ATE (m)  
FR1  Desk  FR3  Long Office  
Pipeline  1 cm  2 cm  1 cm  2 cm 
Ours  0.093  0.098  0.209  0.239 
SE TSDF  0.099  0.103  0.314  FAIL 
SE OFusion  0.100  0.086  0.165  0.172 
IvE Planning
Finally, we evaluate our mapping system as input for path planning. We show that kinodynamically feasible and collision free quadrotor trajectories with map resolutions up to 1 cm can be planned in realtime. To guarantee feasibility, we compute a Safe Flight Corridor (SFC) from the start to end position and optimise order Bernstein polynomial motion primitives within each segment. As a corridor primitive we use cylinders connected by spheres with a minimum radius . We use the open motion planning library’s (OMPL [24]) informed rapidlyexploring random tree* (informed RRT* [25]) planner to create the SFC connecting the start and end positions. OMPL confirms the safety of each corridor segment by verifying that none of the cylinder and sphere volumes is occupied. This is challenging using a regular volumetric grid where the number of checks grows cubically with the map resolution. With our multiresolution maximum occupancy queries, we utilise a ‘coarsetofine’ collision checking approach recursively increasing the resolution in parts of the corridor where needed, thus requiring significantly less checks in most cases.
We investigate the use of our method for several path planing problems illustrated in Figs. 9 and 10 and compare against the SE OFusion library, which also achieves excellent running times and provides some multiresolution output. While our method is able to find suitable trajectories in every iteration, notably in cluttered datasets like the Cow and Lady or large scale like the Oxford Newer College, OFusion fails in several cases. The reason why this happens is due to OFusion lacking a proper mechanism for leveraging information from different scales, instead relying on a simple rule of giving preemptiveness to data in high resolution voxel blocks over values stored at node levels. With the later normally storing information about free space observations and the former allocating surface measurements, the consequence is an algorithm that is fast but heavily biased against noise measurements and outliers. In our system we solve this issue not only propagating data down from nodes to voxel blocks, but also by carefully considering the variation of occupancy inside the node’s whole volume (not only the centre value), as described in section IIID, to avoid having the opposite effect of biasing towards free space.
As an exercise to further confirm these aspect and highlight the benefits of our library, we modified the OFusion library to perform dense allocation, i.e. everything integrated at a single lowest voxel block level. The results can be seen in Table V which shows the minimum required solving time for the planner to find a SFC. While the dense allocation approach is able to remove OFusion noise and find suitable trajectories, the planning time required is significantly worst. This is easily explained by the lack of multiresolution sampling capabilities, a key feature of our system.
Traj 1  Traj 2  Traj 3  Traj 4  
Ours  0.01  0.1  0.06  0.4 
SE OFusion  4  FAIL (too noisy)  3  FAIL (too noisy) 
SE Dense  5  15  3  6 
V Conclusion
We have introduced a multiresolution 3D mapping framework that is using an underlying twotier octree data structure to encode logodds occupancy probabilities. Thanks to explicit free space encoding in a hierarchical way, the approach supports fast collision checking, crucial in robotic path planning and collision avoidance, while providing high resolution reconstructions simultaneously.
Our framework was evaluated extensively in synthetic and realworld RGBD datasets: we showed that surface accuracy is competitive with stateoftheart TSDFbased frameworks, while timings enable realtime or nearrealtime operation even at resolutions as fine as one centimetre. We finally show how our maps can be used in realtime 3D trajectory planning for different scenarios, including large scale LIDAR ones, dramatically improving planing time without an extra step of converting the map into ESDF which many stateoftheart approaches rely on, thus providing seamless integration between mapping and planning that is unprecedented.
References
 F. Nex and F. Remondino, “UAV for 3D mapping applications: A review,” Applied Geomatics, vol. 6, no. 1, pp. 1–15, 2014.
 H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, and J. Nieto, “Voxblox: Incremental 3D Euclidean Signed Distance Fields for onboard MAV planning,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017, pp. 1366–1373.
 A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: an efficient probabilistic 3D mapping framework based on octrees,” Autonomous Robots, vol. 34, no. 3, pp. 189–206, 2013.
 E. Vespa, N. Nikolov, M. Grimm, L. Nardi, P. H. J. Kelly, and S. Leutenegger, “Efficient OctreeBased Volumetric SLAM Supporting SignedDistance and Occupancy Mapping,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1144–1151, 2018.
 L. Han, F. Gao, B. Zhou, and S. Shen, “FIESTA: Fast Incremental Euclidean Distance Fields for Online Motion Planning of Aerial Robots,” arXiv:1903.02144, 2019.
 E. Vespa, N. Funk, P. H. J. Kelly, and S. Leutenegger, “AdaptiveResolution OctreeBased Volumetric SLAM,” in International Conference on 3D Vision, 2019, pp. 654–662.
 A. Elfes, “Occupancy grids: A probabilistic framework for robot perception and navigation,” Ph.D. dissertation, Carnegie Mellon University, 1989.
 S. Thrun, “Learning Occupancy Grid Maps with Forward Sensor Models,” Autonomous Robots, vol. 15, no. 2, pp. 111–127, 2003.
 R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “KinectFusion: Realtime dense surface mapping and tracking,” in IEEE International Symposium on Mixed and Augmented Reality, 2011, pp. 127–136.
 T. Whelan, J. McDonald, M. Kaess, M. Fallon, H. Johannsson, and J. J. Leonard, “Kintinuous: Spatially Extended KinectFusion,” in RSS Workshop on RGBD: Advanced Reasoning with Depth Cameras, 2012.
 V. Usenko, L. von Stumberg, A. Pangercic, and D. Cremers, “Realtime trajectory replanning for MAVs using uniform Bsplines and a 3D circular buffer,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017, pp. 215–222.
 M. Zeng, F. Zhao, J. Zheng, and X. Liu, “Octreebased fusion for realtime 3D reconstruction,” Graphical Models, vol. 75, no. 3, pp. 126 – 136, 2013.
 F. SteinbrÃ¼cker, J. Sturm, and D. Cremers, “Volumetric 3D mapping in realtime on a CPU,” in IEEE International Conference on Robotics and Automation, 2014, pp. 2021–2028.
 M. NieÃner, M. Zollhöfer, S. Izadi, and M. Stamminger, “RealTime 3D Reconstruction at Scale Using Voxel Hashing,” ACM Transactions on Graphics, vol. 32, no. 6, 2013.
 M. Klingensmith, I. Dryanovski, S. Srinivasa, and J. Xiao, “Chisel: Real time large scale 3d reconstruction onboard a mobile device using spatially hashed signed distance fields.” in Robotics: science and systems, vol. 4, 2015, p. 1.
 S. Weder, J. Schönberger, M. Pollefeys, and M. R. Oswald, “Routedfusion: Learning realtime depth map fusion,” arXiv:2001.04388, 2020.
 S. Stent, R. Gherardi, B. Stenger, and R. Cipolla, “Detecting change for multiview, longterm surface inspection.” in BMVC, 2015, pp. 127–1.
 D. Duberg and P. Jensfelt, “Ufomap: An efficient probabilistic 3d mapping framework that embraces the unknown,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6411–6418, 2020.
 C. Loop, Q. Cai, S. OrtsEscolano, and P. A. Chou, “A closedform bayesian fusion equation using occupancy probabilities,” in 2016 Fourth International Conference on 3D Vision (3DV). IEEE, 2016, pp. 380–388.
 A. Knoll, I. Wald, S. Parker, and C. Hansen, “Interactive isosurface ray tracing of large octree volumes,” in 2006 IEEE Symposium on Interactive Ray Tracing. IEEE, 2006, pp. 115–124.
 I. Wald, “A simple, general, and gpu friendly method for computing dual mesh and isosurfaces of adaptive mesh refinement (amr) data,” ArXiv, vol. abs/2004.08475, 2020.
 J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGBD SLAM systems,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 573–580.
 A. Handa, T. Whelan, J. McDonald, and A. J. Davison, “A benchmark for RGBD visual odometry, 3D reconstruction and SLAM,” in IEEE International Conference on Robotics and Automation, 2014, pp. 1524–1531.
 I. A. Şucan, M. Moll, and L. E. Kavraki, “The Open Motion Planning Library,” IEEE Robotics & Automation Magazine, vol. 19, no. 4, pp. 72–82, December 2012, http://ompl.kavrakilab.org.
 J. Gammell, S. Srinivasa, and T. Barfoot, “Informed rrt*: Optimal incremental path planning focused through an admissible ellipsoidal heuristic,” IEEE International Conference on Intelligent Robots and Systems, 04 2014.