A Framework for Visually Realistic Multi-robot Simulation in Natural Environment
This paper presents a generalized framework for the simulation of multiple robots and drones in highly realistic models of natural environments. The proposed simulation architecture uses the Unreal Engine4 for generating both optical and depth sensor outputs from any position and orientation within the environment and provides several key domain specific simulation capabilities. Various components and functionalities of the system have been discussed in detail. The simulation engine also allows users to test and validate a wide range of computer vision algorithms involving different drone configurations under many types of environmental effects such as wind gusts. The paper demonstrates the effectiveness of the system by giving experimental results for a test scenario where one drone tracks the simulated motion of another in a complex natural environment.
Robot simulation, Drone simulation, Natural environment models, Natural feature tracking, Unreal Engine 4 (UE4).
copyrightbox[b] \end@floatGraphical models of realistic natural environments are extensively used in games, notably simulation games and those that use immersive environments. These virtual environments provide a high degree of interactive experience and realism in simulations. Modern game engines provide tools for prototyping realistic, complex and detailed virtual environments. Recently, this capability of game engines has been harnessed to the advantage of computer vision community to develop frameworks that can be used in scientific applications where vision based algorithms for detection, tracking and navigation could be effectively tested and evaluated with various types of sensor inputs and environmental conditions. This paper focuses on the development of a comprehensive standalone framework for multi-robot simulation (specifically, multi-drone simulation) in complex natural environments, and proposes suitable configurations of tools, simulation architectures and also looks at key performance issues.
Several robot simulation engines exist which simulate different robots and vehicles e.g. multicopters, rovers, fixed wing UAV, etc. Each engine has its advantages. The engines use large simulation environments consisting of models, sceneries, etc. generated by other simulation packages and frameworks. Following are some examples of such engines with dependencies on other simulation packages:
Standalone robot simulation engines using a flight simulator for models, sceneries and functions for visualization and simulation. Examples of such packages are: (i) ArduPilot  which communicates with Xplane  and Flightgear  (ii) PX4  communicates with jmavsim . Flight simulators are usually much larger projects than robot simulation projects. They are more focused on user experience and interaction, but they also have visualization and dynamic simulation capabilities which are useful characteristics for drone projects.
Standalone simulation packages that use physics engines, graphical interfaces and simulation capabilities provided by other simulation tools: for example PX4  with Gazebo . Robot simulation environments are dedicated simulation environments. They are focused on giving proper tools for modeling and simulating robots but are less focused on visualization.
Stand alone robot simulation environment: those environments include the robots and flying vehicle models. An example of that kind of environment is: Modular Open Robots Simulation Engine (MORSE). Those environments are suitable for testing and evaluating ideas, but they don’t have roots in real robot projects specifically in drone projects.
Game engine stand alone environment: the robot is simulated inside a game engine. For example a benchmark for tracking based on UE4 . Similarly to robot simulation environment, the drones inside game engines don’t have roots in real drone projects. Additionally drone simulated in game engines don’t share the dynamics of real drones. For example, they don’t have to deal with wind gusts and vibrations.
In this paper, we propose a novel configuration that use game engines for the simulation environment, the primary motivation being the enhanced capabilities of a game engine such as UE4 in providing highly realistic environments and various modes of visualization. One of the primary advantages of this type of a configuration is that a game engine such as UE4can provide realtime videos of camera output based on the position and attitude information of the robot. This paper also gives an overview of the DroneSimLab  developed by us, which has constantly evolved with the analysis of various requirements and concepts related to the simulation architecture presented later in this work. The design and implementation aspects of the key components of this simulation engine have been presented in detail.
This paper is organized as follows: in section 2 we give an overview of the DroneSimLabproject. In section 3 we describe previous related work with game engines. Section 4 gives detailed information about the framework simulation architecture and design goals. Section 5 focuses on modifications needed to be made to meet the simulation design goals. Section 6 describes experimental results and performance. Section 7 provide information on future research directions as well as references to online demos of this research.
2 The DronesimlabProject
We developed DroneSimLabas an opensource project to foster collaborative development of drone simulation packages that use the power and capabilities of the UE4as discussed in the previous sections. Some of the main functionalities which the current implementation provides are:
Multi-robot - can handle more than one robot and create visual interaction.
Software In The Loop (SITL)driven - can simulate two drone models: ArduPilot and PX4
Based on Game Engine - Uses UE4as an optical and depth sensor
Realtime - depends on the hardware but can run at 30 fps.
Natural environments - can simulate trees, wind grass, etc. (comes with Game Engines assets).
3 Related Works
Some image processing and image-based algorithms have already been integrated into game engine/image simulation environments, although using game engines dedicated for games (like UE4, CryEngine, and Unity) in simulations is much less common. We believe the reason for that is due that they are more focused on game experience as opposed to any real-world scientific applications involving simulations with associated mathematical and physical models and computer vision algorithms.
One example of such an approach is the autonomous landing of a Vertical Takeoff and Landing (VTOL)Unmanned Aerial Vehicle (UAV)on a moving platform using image based visual servoing , where they used gazebo simulation simultaneous localization and mapping. Simultaneous Localization and Mapping (SLAM) is also tested and developed for indoor scenarios using gazebo simulation  . Some environments are combined to create a more powerful engine. For example, MORSE combined with BGE (blender game engine)  and JSBsim (an open source flight dynamics model)
Lately, game engines have been increasingly used for simulations. Successful attempts have been made to evaluate the stability of structure using UE4, creating photo-realistic scenes of stacks of blocks and applying deep learning methods . A series of towers made from wooden cubes were created in a simulated environment using UE4. Some of the towers were stable structures, and some collapsed when the dynamic simulation was run. A network was trained to detect the outcome of the experiments. Testing the network on real environments achieved equal performance compared to human subjects in predicting whether the tower will fall. The most important aspect of this research is the fact that they could train the network on 180,000 scenarios which seems not feasible in a real life environment.
A more recent work connected UE4with OpenCV , the project is called UnrealCV . It extends the UE4with a set of commands to interact with the virtual world. Another work  proposed a new aerial video dataset and benchmark for low altitude UAVtarget tracking, as well as a photorealistic UAVsimulator that can be coupled with tracking methods. Skinner  proposed a high-fidelity simulation for evaluating robotic vision performance for repeating robotic vision experiments under identical conditions. Similarly, we are providing a sandbox for high-fidelity simulations not only for algorithms but also for full SITLsimulations.
Recently Microsoft released AirSim , an open source simulator based on Unreal Engine for autonomous vehicles from Microsoft AI & Research which has a similar architecture as our proposed architecture. In their released implementation they are using their physics engine and control libraries.
4 Simulation Architecture
In this paper, we propose a simulation architecture designed to meet the following primary goals: (i) ability to generate realtime camera outputs for any arbitrary position and orientation in a natural environment, (ii) ability to integrate software and hardware in the loop simulations (iii) ability to combine multiple simulations and (iv) ability to reproduce results. These aspects are elaborated below.
4.1 Domain Specific Simulation Engine
We focused on three simulation engines for the framework.
The game engine provides video, depth data and additional visual environmental effects like wind and dust.
The physical model engine, usually supplied by the robot development framework.
Supplimental simulation objects like communication channels models , computation power restrictions and additional simulation filters (for example a lens distortion filter).
Engines can create an environment for a single robot. For instance in the case of SITL, the simulation engine interacts with only one vehicle and produces sensory information for only one robot. On the other hand, the game engine can provide visual information for multiple robots as described in Figure 1, this is especially necessary if a visual interaction exists, for example; one robot can block the field of view for another robot, and this aspect should be implemented in the simulation. By embedding SITLinto our framework we ensure that the simulation is highly correlated with the real robot architecture (since it is used for the robot development). Hardware In The Loop (HIL)can be later used for further validation.
4.2 Simulated sensor architecture
We identified three types of simulated sensors that can be used.
Single domain sensor - lives only in one engine. For example a simulated RGB camera from UR4 . Another example is the gyro sensor, which is simulated only in the SITLsoftware e.g. gazebo, jmavsim, jsbsim etc.
Multi domain sensor - lives in more then one engine. For example such a sensor can be seen in Figure 2. In this example the simulated distance sensor gets information in from various sources like an external Digital Elevation Model (DEM).
Complex sensor - lives in both the physical domain and in the simulated domain. An example of such sensor is a camera in front of a screen. The display provides the visual information and the camera is used just as in the real system, enabling monitoring real system performance and hardware issues. This concept is an extension of the HIL mode which combines hardware testing and software testing.
4.3 Containers as vehicles
We used a container approach to combine simulations not originally intended to work alongside each other. The container approach enables us to use different operating systems and libraries on the same machine. We can also control the network configuration in each container. This approach is different from other frameworks like AirSim  by maintaining the original firmware developed by the Robot Simulation Framework. Our architecture can utilize the benefits of new features and continuing development of those environments.
The ability to reproduce results under differing or constant conditions is vital in system development as well as in research, and becomes more and more difficult as the complexity of the system increases. To realize this concept, we are using several existing software tools. We are using the Docker engine to manage system configuration, and Git version control to handle the software development. Since the experiments are in a simulation, other reproducibility aspects such as high-fidelity are built-in. In section 6, we demonstrate testing of algorithms in complex natural environments by controlling simulation parameters. In this simulation, we can see that outdoor natural environment can be problematic for testing visual algorithms since we don’t have full control of the environment. It seems that true reproducibility in an outdoor natural environment may be achieved only in simulation .
4.5 Build system & configuration management
The simulation environment uses these software tools:
Version Control - All files of this project are managed by Git version control under GitHub servers . The only exceptions are the UE4projects which are managed locally due to the large file sizes. The UE4  source code is still managed by git in dedicated GitHub repository. For the purpose of sending realtime ground truth position, changes have been made both to ArduPilot Project and to PX4 and are managed in separate forks. Those changes are not compatible with the design and purpose of the original projects. Changes that were compatible (e.g. a turbulence model) were returned to the community as pull requests and then pulled back into our local fork.
Containers - Created with Docker engine.
ArduPilot Fork  (Drone Project)
ROS - Supporting firmware for the PX4 project.
PX4 Fork  (Drone Project)
UE4PyServer plugin 
5 Engine Modifications
Game engines are not dedicated research tools, obviously, but conveniently for our usage scenario they supply mechanisms like plugins to extend the capabilities of the engine. The plugin we used for the UE4 is called UE4PyServer  Plugin and was developed for the purpose of this research. The main concepts behind the plugin development were:
Realtime: For this simulation, we took advantage of the realtime capabilities of the game engine. Realtime simulation (RT) is important when you want to run many tests in a short period. RT simulation is also necessary when human interaction is involved because users expect realtime or near realtime behavior. To maintain RT behavior, the UE4plugin was developed with minimal processioning on the UE4side. The primary purpose of the plugin is to communicate with other parts of the simulation. e.g. receiving 6 DOF information and sending video data.
Multi-Robot support - UE4 enables capture of the viewable screen to a file or a buffer, but this provides us with only one camera feed. To allow multiple cameras in the simulation, we used rendering-to-texture technique with object ScreenCapture2D . The method is used in the game engine usually to render surfaces like security cameras, billboards, mirrors, etc. We used it to simulate a camera robot and depth sensors using the depth map provided by the ScreenCapture2D Object.
Synchronization - We wanted the sampling to be synchronized for all the visual objects in the simulation. It is an important concept and might be critical for some applications, for example, simulating stereo camera. For this purpose, we used coroutines which are a light version of synchronized pseudo-threads.
5.2 Building realistic environment inside game engine for computer vision
There are some special considerations when building virtual environments in game engines for computer vision purposes.
Level of Details (LOD)- in game engines using multiple LOD[21, Chapter 3] in order to maintain graphics performance especially frame rate. This may create unnatural textures changes which can be destructive for computer vision algorithms. For the purpose of this research, we can control the environment and the simulation, and we can use that to create a scene with only one LOD.
Repeating patterns - In Figure 3 we can see the meshes used to build the realistic scene. To reduce the effect of repeating patterns, each element is positioned in a different orientation and slightly different scaling. Also, the elements are positioned with some overlap with other objects which reduces the repeating effect.
Culling adjustments - the area rendered in the scene also known as frustum should be large enough for all the objects in the scene to be rendered, so we will avoid popping effects due to movements of the cameras or the objects themselves.
Dynamic shadows adjustments - moving objects in the scene like trees and robots should always cast dynamic shadows to imitate real scenarios.
The SITLengine needs to send 6 DOF information at a high rate to the game engine (at least 30 fps) to maintain realtime constraints. For that purpose, some modifications are needed to the engine, so the SITLengine will send ground truth information directly to the game engine, and also to logging mechanisms for later analysis as described in Figure 1.
6 Experimental Results and Performance Analysis
6.1 Plugin tests in natural environment
Running visual algorithms in a natural environment can be very challenging. Relative to artificial environments, natural scenes can by highly dynamic due to atmospheric conditions such as wind, and usually will not have distinct characteristics like straight lines, circles corners, etc. Using UE4PyServer  (which was developed as part of the simulation framework) and UE4  we developed a tracking simulation (live video can be found here ) to demonstrate the uniqueness of natural environment. The simulation is based on the Lucas-Kanade Optical Flow tracker implemented in the OpenCV library  which we use it to track an ordered grid of points (no feature extraction). The maneuver is a simple camera facing forward and moving diagonally back and then return to the original position as described in Figure 4. Ideally, we would expect that the tracked points will return to the same coordinates when the simulation cycles back to the starting frame. Since this is a complex 3D scene, not all the points will return to the same location due to the loss of tracking, but in Figure 5 we can see that running the experiment twice produces similar results. Similar but not exact, since there is still some randomness in the scene due to movement of leaves that might cause slight differences. In Figure 6 we conducted two experiments with the same setup, but in the second test, we add the wind to the scene by adding to the UE4a Wind Direction Force object. We can see that the results are now very different. We repeated the experiment under various conditions and calculated the following MSE grade to quantify the tracking quality:
where: is the tracking error, is the tracked points, an are the start and end coordinates of the tracked points and is the number of tracked points. Summarized results are in the following table:
|low-alt||96.36 (98.54%)||219.99 (97.40%)|
|81.15 (98.54%)||225.48 (98.05%)|
|87.69 (98.54%)||234.41 (97.89%)|
|high-alt||55.72 (97.56%)||243.59 (98.54%)|
|53.82 (98.21%)||757.61 (97.73%)|
|48.94 (97.40%)||382.78 (98.38%)|
In Table 1, we can see the behavior of the tracking algorithm under different environmental conditions. As expected in high altitude (near the tree tops) with the combination of strong wind will be the most challenging scenario. As seen in the first column, the tracking error under low wind conditions is larger at low altitudes compared to high altitudes due to the presence of a higher density of objects such as leaves and branches that occlude the camera view at low altitudes. On the other hand, when the wind speed increases, the trend is reversed because leaves and branches tend to move more than the objects closer to the ground like tree trunk rocks, etc.
We developed a tool for profiling and monitoring the framework in addition to the existing tools in the UE4editor. This is a high-level profiling tool that gives us the summary of the system utilization. An example of a test case is presented in Figures 4 and 7. fig 4 explains the scene architecture and Figure 7 the corresponding profiling graph. The peaks in the GPU utilization are due to camera IO-intensive movements as would be expected. In this example, the GPU peaks in the graphs correspond to camera movement. When the camera is moving, we can see that the GPU is fully utilized (it reaches 100%) resulting in reduced framerate and when the frame rate is reduced the CPU was less occupied because it was processing the images at a lower frame rate.
We created a setup in DroneSimLabfor an experiment of one drone tracking another drone. The drones are Ardupilot drones simulated using their internal SITLengine. We simulated the wind in the UE4as well as in the SITLengine including wind gusts. The two drones fly into the forest and then return to the original position . One of the drones is using HSV tracker  to track the other drone (Figure 9). We repeated this experiment four times and the results are presented in Figure 8.
In all four scenarios, we can observe the loss of tracking capabilities when the drones enter the forest (black dots between frames 300 to 500 in all four scenarios). When the drones enter the woods, the shades from the forest canopy affects the color and brightness components of the drone as can be seen in this demo video . The threshold for tracking is not updated dynamically to demonstrate this behavior. Other interesting phenomena is the high frequencies observed in the graph. These high frequencies result from the continuous maneuvering and changes in the 3D orientation of the drone to compensate for the high wind forces, which in turn results in variations in the estimation of the drones center position. In the last two experiments, we can see especially large amplitude in the beginning and at the end of the experiments as a result of the takeoff and landing process which provided different angles of viewing of the drone body led to a different estimation of the drone center.
The above results have clearly demonstrated the usefulness of a game engine in not only producing realistic natural environments and their camera outputs, but also providing the ability to add and modify realistic environmental effects such as changes in wind parameters and illumination conditions. These features allow us to generate ground truth data for various test conditions and to evaluate machine vision algorithms.
The creation of visually realistic environments is a very powerful tool for computer vision research as can be seen in section 6 and this corresponding video demo . The DroneSimLabproject  aims to be a tool which adds game engines capabilities to the current existing robot simulation environments. The current work mainly focuses on UE4, but adding another game engine may increase the dimensionality of modifiable parameters in our systems. For instance, training deep learning algorithms on multiple worlds each created by a different game engine may more accurately generalize to the real world domain. This paper has presented a new framework for simulating multi-robot (specifically, multi-drone) motion in such environments, where environmental effects can be easily incorporated, and complex computer vision tasks evaluated. The simulation architecture along with the key functionalities of the simulation engine have been discussed in detail.
-  Ardupilot fork. https://github.com/orig74/ardupilot.
-  Drone tracking drone in dronesimlab. https://youtu.be/Mj9xZECG40Q.
-  Dronesimlab. https://github.com/orig74/DroneSimLab.
-  flightgear. http://www.flightgear.org.
-  gazebo. http://gazebosim.org.
-  Github. https://github.com.
-  jmavsim. https://pixhawk.org/dev/hil/jmavsim.
-  Px4 firmware fork. https://github.com/orig74/Firmware.
-  Scene capture 2d. https://docs.unrealengine.com/latest/INT/Resources /ContentExamples/Reflections/1_7.
-  Ue4pyserver. https://github.com/orig74/UE4PyServer.
-  Unreal engine 4 with python & opencv. https://youtu.be/q8kAooRaf7g.
-  X-plane. http://www.x-plane.com/.
-  Ilya Afanasyev, Artur Sagitov, and Evgeni Magid. Ros-based slam for a gazebo-simulated mobile robot in image-based 3d model of indoor environment. In International Conference on Advanced Concepts for Intelligent Vision Systems, pages 273–283. Springer, 2015.
-  Jean-Yves Bouguet. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation, 5(1-10):4, 2001.
-  G. Bradski. The opencv library. Dr. Dobb’s Journal of Software Tools, 2000.
-  Arnaud Degroote, Pierrick Koch, and Simon Lacroix. Integrating Realistic Simulation Engines within the MORSE Framework. In Workshop on Rapid and Repeatable Robot Simulation (R4 SIM), at Robotics: Science and Systems, Roma, Italy, July 2015.
-  Gilberto Echeverria, Nicolas Lassabe, Arnaud Degroote, and Séverin Lemaignan. Modular open robots simulation engine: Morse. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 46–51. IEEE, 2011.
-  Epic Games. Unreal engine 4. http://www.unrealengine.com.
-  Adam Lerer, Sam Gross, and Rob Fergus. Learning physical intuition of block towers by example. CoRR, abs/1603.01312, 2016.
-  Johannes Meyer, Alexander Sendobry, Stefan Kohlbrecher, Uwe Klingauf, and Oskar Von Stryk. Comprehensive simulation of quadrotor uavs using ros and gazebo. In International Conference on Simulation, Modeling, and Programming for Autonomous Robots, pages 400–411. Springer, 2012.
-  Thomas Mooney. Unreal Development Kit Game Design Cookbook. Packt Publishing Ltd, 2012.
-  Matthias Mueller, Neil Smith, and Bernard Ghanem. A benchmark and simulator for uav tracking. In European Conference on Computer Vision, pages 445–461. Springer, 2016.
-  Weichao Qiu and Alan Yuille. Unrealcv: Connecting computer vision to unreal engine. arXiv preprint arXiv:1609.01326, 2016.
-  Adrian Rosebrock. Ball tracking with opencv. http://www.pyimagesearch.com/2015/09/14/ball-tracking-with-opencv/.
-  Pedro Serra, Rita Cunha, Tarek Hamel, David Cabecinhas, and Carlos Silvestre. Landing on a moving target using image-based visual servo control. In 53rd IEEE Conference on Decision and Control, pages 2179–2184. IEEE, 2014.
-  Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Aerial Informatics and Robotics platform. Technical Report MSR-TR-2017-9, Microsoft Research, 2017.
-  John Skinner, Sourav Garg, Niko Sünderhauf, Peter Corke, Ben Upcroft, and Michael Milford. High-fidelity simulation for evaluating robotic vision performance. In Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on, pages 2737–2744. IEEE, 2016.