Explainable Deep Reinforcement Learning for UAV Autonomous Navigation

Explainable Deep Reinforcement Learning for UAV Autonomous Navigation


Modern deep reinforcement learning plays an important role to solve a wide range of complex decision-making tasks. However, due to the use of deep neural networks, the trained models are lacking transparency which causes distrust from their user and hard to be used in the critical field such as self-driving car and unmanned aerial vehicles. In this paper, an explainable deep reinforcement learning method is proposed to deal with the multirotor obstacle avoidance and navigation problem. Both visual and textual explanation is provided to make the trained agent more transparency and comprehensible for humans. Our model can provide real-time decision explanation for non-expert users. Also, some global explanation results are provided for experts to diagnose the learned policy. Our method is validated in the simulation environment. The simulation result shows our proposed method can get useful explanations to increase the user’s trust to the network and also improve the network performance.

Explainable, Deep reinforcement learning, UAV obstacle avoidance.

I Introduction

Unmaned Aerial Vehicles (UAVs) have been widely used in many application, such as good delivery, emergency surveying and mapping. Autonomous navigation in the large unknown complex environment is an essential capability for these UAVs to operate more intelligent and safety.

In general, there are two main solutions for UAV obstacle avoidance. The first solution relies on the state estimator using VIO or SLAM, then generate safety trajectories using optimization method [14, 32]. It’s a cascade process include mapping, localization planning and control. This kind of method can generate nearly optimal trajectories for some optimization objectives such as safety and smoothness, they require lots of computation and memory to store the map and run the optimization algorithms every step. In addition, these techniques also suffer from high drift and noise, impacting the quality of both localization and the map used for planning. Another solution is using a reactive control method, which can generate control command from the perception information directly [18, 5]. This method is efficient, however, it is always non-optimal because of lacking global information.

UAV navigation is a sequential decision-making problem. Some researchers modelled this problem as a Markov decision process (MDP) and solved using reinforcement learning (RL) methods. For example, Ross et at [20] build and Imitation learning (IL)-based controller using a small set of human demonstrations and achieved a good performance in natural forest environments. Imanberdiyev et at [11] developed a high-level control method for autonomous navigation of UAVs using a novel model-based reinforcement learning method, TEXPLORE. He et al [9] combine bio-inspired monocular vision perception method with a deep reinforcement learning (DRL) reactive local planner to address the UAV navigation problem. They also proposed learning from demonstration method to speed up the training process [8]. Wang et al [28] formulated the navigation problem as a partially observable Markov decision process (POMDP) and solved by a novel online DRL algorithm. He also invested the sparse reward situation using a learn with help (LwH) method [29]. Comparing to the optimization-based method, the RL method can get the end-to-end policy which can process raw sensor data directly such as images. There is no need to do the optimization every time, which is computation efficiency. Also, once the training converges, the optimal policy will be obtained at every state.

Fig. 1: Network architecture of our control policy. The input is raw depth image and UAV states such as current speed and relative position to the goal. The features in the Depth image is extracted using CNN. Then global average pooling layer is used to get the intensity of each visual feature and then feed to the fully connected network combined with state features. The outputs are 3 control command includes forward, climb and steering speed.

Although DRL method can get excellent performance, an enormous problem is that deep learning methods turn out to be uninterpretable “black boxes,” which create serious challenges to the Artificial Intelligence (AI) system based on neural network [7]. This problem falls with the so-called eXpalinable AI (XAI) filed. Arrieta et al gives a review of XAI [2].

Comparing to the burst of XAI research in supervised learning, explainability for RL is hardly explored [10]. Juozapaitis et al [12] explain the RL agent using reward decomposition. This approach decomposes reward into sums of semantically meaningful reward types so that actions can be compared in terms of trade-offs among the types. Reward deposition is also used in strategic tasks such as StarCraft II [19]. Jung Hoon Lee [13] proposed a method to derive a secondary comprehensible agent from NN-based RL agent, the decision makings are based on simple rules. Beyret et at [3] proposed a explainable RL for robotic manipulation. They presented a hierarchical DRL system include both low-level agent handling actions and high-level agent learning the dynamics and the environment. The high-level agent is used to interpret for the human operator. Madumal et at [16] use causal models to derive causal explanations of the behaviour of model-free reinforcement learning agent. A structural causal model is learned during the reinforcement learning phase. The explanations of behaviour are generated based on the counterfactual analysis of the causal model. They also introduced a distal explanation model that can analyse counterfactual and opportunity chains using decision trees and causal models [17].

Explainability is critical and essential for DRL-based UAV navigation system. On the one hand, it’s useful for non-expert users to know the reason why the controller turn right rather than turn left when it facing an obstacle. On the other hand, it also benefits the network and controller designer to know the decision making progress and do some adjustment to improve network performance.

This work proposed an explainable deep reinforcement learning method for UAV navigation and obstacle avoidance in the complex unknown environment. First, a navigation policy is trained using DRL method in a high-fidelity simulation environment. Then, the trained network is explained using a post-hoc explanation method based on feature attribution. Comparing to the transparent model methods, post-hoc methods can provide explanations of an RL policy after its training, which keeps the model performance. Both real-time visual and textual explanation is provided for non-expert users to make them understand the trained model. Moreover, some trajectory explanations can also be used for experts to analyze and improve the network.

Our main contributions can be summarised as follows:

  • An autonomous navigation policy for UAV learned using DRL method.

  • A novel CNN attention visualization method based on fair feature attribution.

  • Real-time textual model decision explanation for non-expert users.

Ii Preliminaries

Ii-a MDP and DRL

In this work, the navigation and obstacle avoidance problem is formulated with MDP. An MDP is defined by a tuple , where is the set of the states, is consists of a set of states , a set of actions , a reward function , a transition function , and a discount factor . In each state , the agent takes an action . After executing the action in the environment, the agent receives a reward and reaches a new state , determined from the probability distribution . The goal of DRL is to find a policy mapping states to actions that maximizes the expected discounted total reward over the agent’s lifetime. This concept is formalized by the action value function: , where is the expectation over the distribution of the admissible trajectories obtained the policy starting from and .

Ii-B Reinforcement Learning for UAV Navigation

Here, we treat the UAV navigation problem as a sequential decision process and formulated it as an MDP. Suppose the UAV takes off from a departure position in a 3-D environment, which is denoted as in the Earth-fixed coordinate frame, and targets at flying to a destination that is denoted as . The observation or the state at time consists of both raw depth image and UAV state features: . The state feature consists of relative position to goal and current velocity: ], where and denote the distance between the UAV’s current position and the destination position in x-y plane and z axis, is the relative angle between UAV current first-perspective direction to the destination position, and are the UAV current speed and is the steering angular speed. Action generated from the policy network consists of 2 linear velocity and 1 angular velocity. These actions are passed to the low-level controller as velocity setpoint command to achieve the navigation. The network architecture is shown in Fig. 1.

Ii-C Feature Attribution

Formally, suppose we have a function that represents a deep neural network and an input . An attribution of the prediction at input relative to a baseline input is a vector where is the contribution of to the prediction . There are two different types of feature attribution algorithms: Shapley-value-based algorithm and gradient-based algorithm. There is a fundamental difference between these two algorithm types.

Shapley value is a classic method to distribute the total gains of a collaborative game to a coalition of cooperating players. It is a fair way to attribute the total gain to the players based on their contributions. For ML models, we formulate a game for the prediction at each instance. We consider the “total gains” to be the prediction value for that instance, and the “players” to be the model features of that instance. The collaborative game is all of the model features cooperating to form a prediction value. A Shapley-value-based explanation method tries to approximate Shapley values of a given prediction by examining the effect of removing a feature under all possible combinations of presence or absence of the other features. Shapley values are the only additive feature attribution method that satisfies the desirable properties of local accuracy, missingness, and consistency. However, exact Shapley value computation is exponential in the number of features.

A gradient-based explanation method tries to explain a given prediction by using the gradient of the output with respect to the input features. However, the problem with gradients is that they break sensitivity, a property that all attribution methods should satisfy. For example, consider a one variable, one ReLU network, . Suppose the baseline is and the input is . The output changes from 0 to 1, but the gradient is zero at because becomes flat after , so the gradient method gives attribution of 0 to . This phenomenon has been reported in [24]. To address this problem, Sundararajan et al [27] proposed Integrated Gradients (IG) algorithm. However, this algorithm requires computing the gradients of the model output on a few different inputs (typically 50) between current feature value and baseline value.

Ii-D SHAP and DeepSHAP

SHAP (SHapley Additive exPlanations), proposed by Lundberg and Lee [15], can assigns each feature an importance value for a particular prediction. For a simple linear regression problem, the predictions can be written as:


where is the i-th predicted response, are the features of current observation, and are the estimated regression coefficients. If the features are independent, the contribution of the k-th feature to the predicted response can be unambiguously expressed as for .

SHAP is a generalization of this concept to more complex neural network models. We define the following:

  • is the entire set of features, and denotes a subset.

  • is the union of the subset and feature .

  • is the conditional expectation of model when a subset of features are fixed at the local point .

Then, the SHAP value is defined to measure the contribution of the i-th feature as


SHAP values are proved to satisfy good properties such as fairness and consistency on attributing importance scores to each feature. But the calculation of SHAP values is computationally expensive. In our case, we use Deep SHAP, which is a model-specific method to improve computational performance through a connection between Shapley values and DeepLIFT [23].

DeepSHAP [4] is a framework for layer-wise propagation of Shapley values that builds upon DeepLIFT [23]. If we define including an input as setting it to its actual value instead of its reference value, DeepLIFT can be thought of as a fast approximation method of the Shapley values. If our model is fully linear, we can get exact SHAP values by summing the attributions along all possible paths between input and the model’s output . However, in our network, for example fully connected network, there are non-linear activation function applied after the linear part, such as ReLU, tanh or sigmoid operations. To deal with the non-linear part, DeepLIFT provided the Rescale rule and the RevealCancel rule. Passing back nonlinear attributions linearly is an approximation, but there are two main benefits: 1) fast computation using only one backward pass and 2) a guarantee of local accuracy.

Fig. 2: SHAP-CAM method. Different from CAM and Grad-CAM, in our problem, the network output is action rather than class score. We use Global Average Pooling as CAM to get the CNN perception feature intensity. Then SHAP value is calculated directly as the weight of the saliency map.

Iii Proposed Method

In this section, we introduce our model explanation method. The trained policy network consists of CNN perception part and FC control part. A novel visual explanation method is proposed to localize the CNN attention position. In addition, a textual explanation method based on the feature attribution is also provided for real-time action explanation.

Iii-a Visual explanation combines both CAM and SHAP values

Understanding the insights of CNN has always been a pain point, though CNN can get excellent predictive performance. In our problem, CNN is used to extract the visual feature from the raw depth image. CNN visualization can provide a better explanation for the RL policy.

In [30], a deconvolutional network (Deconvnet) approach was proposed to visualize activated pattern in each hidden unit. This method can visualize features individually but is limited as it is hard to summarize all hidden patterns into one pattern. Simonyan et al [25] visualize partial derivatives of predicted class scores w.r.t.pixel intensities, while Guided Backpropagation [26] makes modifications to ‘raw’ gradients that result in qualitative improvements. This method can provide fine-grained visualizations.

In [31], the authors proposed Class Activation Map (CAM) using global average pooling (GAP) layer to summarize the activation of the last CNN layer. However, it is only applicable to a particular CNN architecture where global average pooled convolutional feature maps are fed directly into softmax. Grad-CAM provides a new way of combining feature maps using the gradient signal that does not require any modification in the network architecture [21]. It can be used to off-the-shelf CNN architecture. Grad-CAM uses the gradient information flowing into the last convolutional layer of CNN to assign importance values to each neuron for a particular decision of interest. Both CAM and Grad-CAM is mainly used for the classification problem.

To visualize the CNN perception part of our network, a method combined both CAM and SHAP values is proposed. Because our problem is a regression problem, we call this method SHAP-RAM (SHAP value-based regression activation map). Similar to CAM method, global average pooling (GAP) layer is used to summarize the visual feature in our CNN perception network. The output of the GAP layer is defined as the CNN feature. Different from CAM and Grad-CAM, in our method, the SHAP value of CNN feature is used to determine the importance of the CNN feature which generated from the corresponding activation map. A coarse localization map highlighting the important regions in the image is generated by a weighted sum of the last CNN activation map, where SHAP value is the weight.

Comparing to CAM, our method can be used in any network architecture with GAP layers. Comparing to Grad-CAM, SHAP value is used as weights of the forward activation maps rather than gradients, which can provide a fairer attribution of the activation maps.

Iii-B Real-time textual explanation for DRL based UAV navigation

Our model has 3 continuous action outputs, horizontal velocity , vertical velocity and steering angular velocity . To get the textual explanation, each action is divided into 3 parts based on the reference action, as shown in Fig. 3. If the action is similar to the reference action, we think that this action is to maintain current action. If the output action either bigger or smaller than the reference action, a specific text is used to describe the action, such as ’slow down’ or ’speed up’. The final textual output of the action is the combination of these three textual descriptions, for example, the action can be described as ’slow down, maintain the altitude and turn right’.

Fig. 3: Action description. Each action is divided into 3 parts. While the prediction fall into the central part, we say it is maintain the current state. Otherwise, there will be a textual description of each action. The final description will be the combination of these three individual descriptions.

Finally, both visual and textual explanation is used to explain the network policy output. Because of the fast computing speed, a real-time explanation can be achieved for every action.

Iv Model Training

Iv-a Training Environment and Setting

The navigation network is trained from scratch in AirSim [22] simulator built on Unreal Engine, which provides high fidelity depth image and a low-level controller to stabilize the UAV. A customized environment is created using the Unreal Engine which is shown in Fig. 4. The size of the environment is square with 200 meters on each side. Some stones were randomly placed as obstacles. At the beginning of each episode, the quadrotor takes off from the centre of the environment. The goal is set randomly on the circle with a radius of 70 meters and centred on the take-off point. The episode terminated when the quadrotor reaches the goal position with an accept radius of 2 meters or crashed on the obstacles. The controller is running at 10Hz to generate velocity command to the low-level controller provided by AirSim.

An off-policy model-free reinforcement learning algorithm, Twin Delayed DDPG (TD3) [6], is used for model training. As the successor of the DDPG method, TD3 addresses the overestimate problem issue of Q-value in DDPG by introducing three critical tricks: clipped double Q-Learning, delayed policy update and target policy smoothing [1]. This DRL algorithm is widely used for continuous control problem. The hyperparameters of TD3 are summarized in Table I in Appendix.

Fig. 4: Training environment

Iv-B Reward Function Design

The reward function is critical for DRL problem. In general, the reward function for navigation is simple, we can only reward for reach the goal as soon as possible and punish collision. However, because the state space is very huge in the navigation task, it’s better to introduce some continuous reward signal to guide the exploration and speed up the training process. After a lot of testing, a hand-designed reward function is utilized which consists of a continuous goal approaching reward and some penalty terms:


where is the goal approaching reward and is the Euclidean distance from current position to goal position at time . is the penalty term at current step:


where , and are penalty terms for obstacle, action, and position error.


is the penalty term to prevent the quadrotor from getting close to the obstacle. In equation 5, and is the safety distance and minimum distance allowed to the obstacles. is the minimum distance to the obstacle at time . In our training process, and , which means we give punishment if the quadrotor gets close to the obstacle in 5 meters. When the minimum distance to the obstacle is less than 1 meter, it is considered crashed and this episode terminates. To stabilize the training process, the continuous reward part is constrained to -1 to 1.

Iv-C Training Result

The policy network is trained for 100k time steps (around 1000 episodes). To speed up the training process, the Airsim simulation clock speed is set to 10. The total training process took about 7 hours on an Intel i7-8700 processor and an Nvidia GeForce GTX1060 GPU. The episode reward and success rate are plotted in Fig. 5. From the training result, the policy gets over 80% success rate which means the network can guide the UAV to the goal position without collision with any obstacles.

(a) Episode reward
(b) Success rate
Fig. 5: Mean episode reward and success rate versus the training step curves. The mean reward and success rate is obtained by evaluating each learned policy over 5 randomly generated navigation tasks.
Fig. 6: Reference depth image.

V Model explanation

After training, we can get a policy with good performance. In order to keep the performance, we do the post-hoc real-time explanation based on the trained policy. DeepSHAP method is used to get feature importance and our explanation will be generated based on these SHAP values.

V-a Defining the Reference

Feature attribution method generates the contribution of each feature based on a reference input or baseline input. The choice of the reference input is critical for obtaining insightful results [23]. In practice, choosing a good reference would rely on domain-specific knowledge. For instance, in object recognition networks, it is the black image.

In our case, we choose the depth image without any obstacles as the reference image input. For state feature input, we set ] which means the UAV just take off from the start point and has no velocity. The reference image is shown in Fig. 6. Based on this reference input, we can get reference action from the policy network: .

V-B Trajectory Analysis

We choose one of the trajectories from the evaluation process to get some inside information of the policy. Fig. 7 shows the depth image at different time steps. Fig. 8 and Fig. 9 shows the control command and state features over the trajectory. From in Fig. 9, we can see that the UAV always fly towards the goal position and the distance to goal is reducing over the trajectory. Finally, at , UAV reached the goal position.

Fig. 7: Depth image and SHAP-RAM at 10 different time steps in the trajectory
Fig. 8: Policy output
Fig. 9: State feature

V-C Action explanation

Action explanation can be generated for every time step. Here, 3 specific time steps are choosing to demonstrate our visual and textual explanation for actions. As shown in Fig. 10, at , the action is slow down, keep altitude and turn right. The explanation shows both slow down and turn right are caused by the angular error to goal. This makes sense because the direction at doesn’t match the goal position, so the UAV need turn right. At , the action is slow down, climb and turn right. The explanation shows this is caused by the CNN feature. From the heatmap generated using SHAP-RAM, we can see the CNN detected left edge of the stone which is the obstacle. At , the action is slow down, climb and turn left. This is also caused by the CNN feature.

To find out the meaning of the CNN features, we also plotted the last CNN layer activation map at both and as shown in Fig. 11. From this activation map, we can see at , CNN feature 8 is the left and right edges of the obstacle which contributes most to the slow down action. CNN feature 7 is the obstacle and some ground which contributes to the climb. CNN feature 4 shows the right side edge of the obstacle with some free space background, which leads to the turn right action.

(a) Action explanation at
(b) Action explanation at
(c) Action explanation at
Fig. 10: Action explanation at 3 different time steps.
(a) Last CNN layer activation map at
(b) Last CNN layer activation map at
Fig. 11: Last CNN layer activation map.

V-D Model analysis

After the action explanation, we can summarize all the feature attribution over the 20 trajectories, 2858 time steps in total. Fig. 12 shows the SHAP summary plot that orders the features based on their importance to the different action. We can see that the CNN feature contributes most to action and . Except the CNN features, the current horizontal velocity and distance to goal are the most importance features contribute to . , and contributes more to , the vertical velocity command. The angle error is the most important feature to .

Fig. 12: Feature analysis over the trajectory.

With the feature value and its SHAP value, we can invest the relationship between the feature intensity and its importance measurement as shown in Fig. 13. From the plot, we can find that there is some relationship between the feature value and the SHAP value. For example, the angle error shows a positive correlation to its SHAP value. However, the angular speed shows a negative correlation.

Fig. 13: Feature dependence plot using 2858 sample for 20 trajectories. The x-axis is the feature value, the y-axis is its SHAP value. The feature value is normalized to 0 to 1 so angle error is 0.5 means . The first row shows the SHAP value of state feature and with respect to . The second row shows the SHAP value of state feature and with respect to . The third row shows the SHAP value of state feature and with respect to .

Vi Conclusion

In this paper, the UAV autonomous navigation problem is solved with the DRL technique. Different from other works, this paper mainly focused on the model explainability rather than treat the trained model as a black box. Based on the feature attribute, both visual and textual explanation are generated to open the black box. To get a better visual explanation of the CNN perception part, a new saliency map generation method proposed combining both CAM and SHAP values. Our method can provide real-time action textual explanation for non-expert users which is important for the application of DRL based model in the real world.

Because this paper mainly focused on the explanation part, the trained model is not perfect. There still some explanations don’t make sense. In the future, the model will be fine-trained and improved based on the explanation. Finally, the trained model and explanation method will be verified on a UAV platform in the real complex outdoor environment.

-a Hyperparameters of TD3

The hyperparameters are shown in Table

Hyperparameter Value
mini-batch size 128
replay buffer size 50000
discount factor 0.99
learning rate 0.0003
random exploration steps 2000
square deviation of exploration noise 0.3
TABLE I: Hyperparameters of TD3


  1. J. Achiam (2018) Spinning Up in Deep Reinforcement Learning. Cited by: §IV-A.
  2. A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina and R. Benjamins (2020) Explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion 58, pp. 82–115. Cited by: §I.
  3. B. Beyret, A. Shafti and A. A. Faisal (2019) Dot-to-dot: explainable hierarchical reinforcement learning for robotic manipulation. arXiv preprint arXiv:1904.06703. Cited by: §I.
  4. H. Chen, S. Lundberg and S. Lee (2019) Explaining models by propagating shapley values of local components. arXiv preprint arXiv:1911.11888. Cited by: §II-D.
  5. H. D. Escobar-Alvarez, N. Johnson, T. Hebble, K. Klingebiel, S. A. Quintero, J. Regenstein and N. A. Browning (2018) R-advance: rapid adaptive prediction for vision-based autonomous navigation, control, and evasion. Journal of Field Robotics 35 (1), pp. 91–100. Cited by: §I.
  6. S. Fujimoto, H. Van Hoof and D. Meger (2018) Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477. Cited by: §IV-A.
  7. R. Goebel, A. Chander, K. Holzinger, F. Lecue, Z. Akata, S. Stumpf, P. Kieseberg and A. Holzinger (2018) Explainable ai: the new 42?. In International cross-domain conference for machine learning and knowledge extraction, pp. 295–303. Cited by: §I.
  8. L. He, N. Aouf, J. F. Whidborne and B. Song (2020) Deep reinforcement learning based local planner for uav obstacle avoidance using demonstration data. arXiv preprint arXiv:2008.02521. Cited by: §I.
  9. L. He, N. Aouf, J. F. Whidborne and B. Song (2020) Integrated moment-based lgmd and deep reinforcement learning for uav obstacle avoidance. pp. 7491–7497. Cited by: §I.
  10. A. Heuillet, F. Couthouis and N. D. Rodríguez (2020) Explainability in deep reinforcement learning. arXiv preprint arXiv:2008.06693. Cited by: §I.
  11. N. Imanberdiyev, C. Fu, E. Kayacan and I. Chen (2016) Autonomous navigation of uav by using real-time model-based reinforcement learning. In 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp. 1–6. Cited by: §I.
  12. Z. Juozapaitis, A. Koul, A. Fern, M. Erwig and F. Doshi-Velez (2019) Explainable reinforcement learning via reward decomposition. In IJCAI/ECAI Workshop on Explainable Artificial Intelligence, Cited by: §I.
  13. J. H. Lee (2019) Complementary reinforcement learning towards explainable agents. arXiv preprint arXiv:1901.00188. Cited by: §I.
  14. S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor and V. Kumar (2017) Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments. IEEE Robotics and Automation Letters 2 (3), pp. 1688–1695. Cited by: §I.
  15. S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. In Advances in neural information processing systems, pp. 4765–4774. Cited by: §II-D.
  16. P. Madumal, T. Miller, L. Sonenberg and F. Vetere (2019) Explainable reinforcement learning through a causal lens. arXiv preprint arXiv:1905.10958. Cited by: §I.
  17. P. Madumal, T. Miller, L. Sonenberg and F. Vetere (2020) Distal explanations for explainable reinforcement learning agents. arXiv preprint arXiv:2001.10284. Cited by: §I.
  18. S. Paschall and J. Rose (2017) Fast, lightweight autonomy through an unknown cluttered environment: distribution statement: a—approved for public release; distribution unlimited. In 2017 IEEE Aerospace Conference, pp. 1–8. Cited by: §I.
  19. R. Pocius, L. Neal and A. Fern (2019) Strategic tasks for explainable reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 10007–10008. Cited by: §I.
  20. S. Ross, N. Melik-Barkhudarov, K. S. Shankar, A. Wendel, D. Dey, J. A. Bagnell and M. Hebert (2013) Learning monocular reactive uav control in cluttered natural environments. In 2013 IEEE international conference on robotics and automation, pp. 1765–1772. Cited by: §I.
  21. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626. Cited by: §III-A.
  22. S. Shah, D. Dey, C. Lovett and A. Kapoor (2017) AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics, External Links: arXiv:1705.05065, Link Cited by: §IV-A.
  23. A. Shrikumar, P. Greenside and A. Kundaje (2017) Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685. Cited by: §II-D, §II-D, §V-A.
  24. A. Shrikumar, P. Greenside, A. Shcherbina and A. Kundaje (2016) Not just a black box: learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713. Cited by: §II-C.
  25. K. Simonyan, A. Vedaldi and A. Zisserman (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Cited by: §III-A.
  26. J. T. Springenberg, A. Dosovitskiy, T. Brox and M. Riedmiller (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806. Cited by: §III-A.
  27. M. Sundararajan, A. Taly and Q. Yan (2017) Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365. Cited by: §II-C.
  28. C. Wang, J. Wang, Y. Shen and X. Zhang (2019) Autonomous navigation of uavs in large-scale complex environments: a deep reinforcement learning approach. IEEE Transactions on Vehicular Technology 68 (3), pp. 2124–2136. Cited by: §I.
  29. C. Wang, J. Wang, J. Wang and X. Zhang (2020) Deep reinforcement learning-based autonomous uav navigation with sparse rewards. IEEE Internet of Things Journal. Cited by: §I.
  30. M. D. Zeiler and R. Fergus (2014) Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Cited by: §III-A.
  31. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba (2016-06) Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §III-A.
  32. B. Zhou, F. Gao, L. Wang, C. Liu and S. Shen (2019) Robust and efficient quadrotor trajectory generation for fast autonomous flight. IEEE Robotics and Automation Letters 4 (4), pp. 3529–3536. Cited by: §I.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description