A Novel Learningbased Global Path Planning Algorithm for Planetary Rovers
Abstract
Autonomous path planning algorithms are significant to planetary exploration rovers, since relying on commands from Earth will heavily reduce their efficiency of executing exploration missions. This paper proposes a novel learningbased algorithm to deal with global path planning problem for planetary exploration rovers. Specifically, a novel deep convolutional neural network with double branches (DBCNN) is designed and trained, which can plan path directly from orbital images of planetary surfaces without implementing environment mapping. Moreover, the planning procedure requires no prior knowledge about planetary surface terrains. Finally, experimental results demonstrate that DBCNN achieves better performance on global path planning and faster convergence during training compared with the existing Value Iteration Network (VIN).
I Introduction
During planetary exploration missions, rovers are required to explore diverse targets of interest after successful landing. Since the surfaces of planets (e.g. Mars and Moon) are covered with dangerous areas (e.g. rocks, steep slope, and craters) and the power supplied for rovers are limited, it is important for planetary rovers to find collisionfree and energyefficient paths to destination [1]. Moreover, the uncertain planetary environments and the unavoidable communication delays between Earth and other planets make it impractical to provide realtime decision and control for rovers from Earth. This means that the design of autonomous path planning algorithms is indispensable for planetary rovers.
The planetary path planning problem can be classified into two types, namely global path planning and local path planning. For global path planning, the whole trajectories from rovers’ start positions to their targets are required to be determined from planetary surface images captured by orbit satellites. It can be fullfilled offline since global environments are totally observable. For local path planning, the partial trajectories from rovers’ current positions to their ends of sight need to be planned from their observations of local environments. It is commonly executed online since only local environments are observable. This paper concentrates on dealing with global path planning problem for planetary rovers.
Typically, the initial stage of implementing global path planning algorithms is mapping the realworld environment [2]. More precisely, observations of global environments are commonly transformed into configuration space (Cspace), visibility graph, Voronoi diagram or grid maps [3, 4]. Then, global path planning algorithms can be applied. In [5, 6], classical shortest path search methods such as Dijkstra algorithm and Floyd algorithm were firstly employed to deal with global path planning problem. However, since global path planning with multiple obstacles is nondeterministic polynomial time hard (NPhard) [2], it is timeconsuming to find the shortest path through traversal search. Therefore, heuristic and evolutionary algorithms were adopted to address global path planning efficiently. In [7, 8], heuristic search algorithms such as and were applied to achieve efficient path planning for mobile robots successfully. Then, inspired by natural and biological intelligence, evolutionary algorithms such as genetic algorithm [9], particle swarm optimization [10], ant colony algorithms [11], and neural network algorithms [12] were extended into global path planning problems for planetary rovers. It is noteworthy that these algorithms cannot work without environment mapping, for which humans’ prior knowledge about planetary environments are necessary.
In order to achieve autonomous path planning directly from orbital images, some algorithms have to firstly represent and learn deep features of orbital images such as the shape and location of obstacles. Then, according to these deep feature, the optimal path can be determined. In recent years, Deep Convolutional Neural Networks (DCNNs) have received wide attention in computer vision field for their superior feature representation and learning capability [13]. Inspired by the stateoftheart performance of DCNNs in visual feature representation and learning, learning to plan directly from original images have been researched. Since global path planning is a sequential decision making process, one proven techique is formulating it as a Markov Decision Process (MDP) and finding the optimal path planning policy through value function estimation. In [14], a novel DCNN arthitecture—Value Iteration Network (VIN) was proposed to effectively estimate value functions in MDP. Then, the goal of planning path directly from Martian orbital images was achieved. Based on the work of VIN, Memory Augmented Control Network and Neural Map were proposed to find the optimal path for rovers in partially observable environment in [15] and [16] respectively. Further, in order to plan path for rovers under dynamic environments, Value Propagation Network [17] was designed. However, all these networks contain the value iteration module in VIN, which has low training and planning efficiency since it requires multiple times of iteration inside the network for value function estimation.
Therefore, in this paper, we design a novel DCNN architecture with double branches and noniteration sturcture (DBCNN) for value function estimation, which can achieve global path planning with higher efficiency and precision. The main contributions of this paper are summarized as follows:

A novel DCNN architecture with double branches (DBCNN) is designed to achieve autonomous global path planning direcly from planetary orbital images.

We present the global path planning algorithm based on DBCNN and illstruate its merits over traditional global path planning methods.

Compared with the stateofthe art architecture (VIN), DBCNN achieves better performance and faster convergence on planetary global path planning tasks.

Experimental analysis demonstrates that DBCNN has more efficient learning structure and the training time is alrgely reduced by compared with VIN.
The rest paper is organized as follows. Section II provides preliminaries of this paper. Section III describes the proposed DBCNN for global path planning of planetary rovers. Experimental results and analysis are illustrated in Section IV, followed by discussion and conclusions in Section V.
Ii Preliminaries
Iia Markov Decision Process
A standard MDP for sequential decision making is composed of action space , state space , reward fucntion , transition probability distribution and discounted factor , where the policy is denoted by . At time step , the agent can observe its state from environment and then choose its action satisfying (or if the policy is deterministic). After that, its state will transit into and the agent will then receive reward from environment, where satisfies (or if the state trasition process is deterministic). The whole process is shown in Fig. 2.
Furthermore, denote the discount factor of reward by . The optimal policy is defined as
(1) 
To measure the expected accumulative reward of and , the state value function and the action value function are defined as
(2) 
(3)  
However, since both state value function and action value function are unknown, it is impossible to determine through Eq. (4) directly. Therefore, value functions of MDP have to be estimated precisely so that the optimal policy can be found.
IiB Value Function Estimation
Value iteration is an typical method for value function estimation and then addressing MDP problem [14]. Denote the estimated state value function at step by , and the estimated action value function for each state at step by . is utilized to represent the deterministic policy at step . Then, the value iteration process can be expressed as
(5) 
(6)  
However, since it is difficult to determine the explicit representation of , and (especially when the dimension of is high), VIN is designed to approximate this process successfully, which consists of Value Iteration Module. As illustrated in Fig. 3, the value function layer is stacked with the reward layer and then filtered by a convolutional layer and a maxpooling layer recurrently. Furthermore, through VIN, global information including orbital images and target position can be conveyed to each state in the final value function layer. Experiments demonstrate that this architecture performs well in learning to plan tasks. However, it takes lots of time to train such a recurrent convolutional neural network especially when the value of iteration time ( in Fig. 3) becomes large. Therefore, replacing Value Iteration Module with a more efficient architecture without losing its excellent global path planning performance becomes the focus of this paper.
IiC Methods for Value Function Estimation
Generally, there exist two learningbased methods for value function estimation—reinforcement learning [18] and imitation learning [19]. In reinforcement learning, no prior knowledge is required and the agent can find the optimal policy in complex environment by trial and error [20]. However, the training process of reinforcement learning is computationally inefficient. In Imitation learning, when the expert dataset is given , the training process transforms into supervised learning with higher dataefficiency and fitting accuracy.
Considering that the expert dataset for global path planning is available ( is the optimal action at state and is the number of samples), in this paper, imitation learning method is applied to find the optimal navigation policy.
Iii Model Description
Iiia Global Path Planning Model
In this subsection, we formulate the global path planning problem of planetary rovers into a MDP defined as .
IiiA1 State Space
The state space of is denoted as , consisting of , and . More precisely, is the planetary orbital image at time step with height , width , and channels, is the target position for planetary rover at time step , and is rover’s location at time step .
IiiA2 Action Space
The action space of is denoted as , representing eight potential moving direction of planetary rover (0: east, 1: south, 2: west, 3: north, 4: southeast, 5: northeast, 6: southwest, 7:northwest).
IiiA3 State Transition Function
Since state transition process in this MDP is deterministic, it is defined as . After taking action , the state will transit into . Notably, for given exploration mission, the planetary orbital image and the target position for planetary rover in state are constant during each path planning step while the rover’s position in state will change at each step.
IiiA4 Reward Function
If the rover reaches the target point precisely at time step after taking action , it will obtain a positive reward (). Otherwise, it will get a negative reward (). Therefore, the optimal path from start position to target position will have the maximal accumulative rewards.
IiiA5 Problem Formulation
Denote the DCNN designed for value function estimation as , where represents the parameter of this DCNN and is the estimated value of . Then, the policy for global path planning is derived as
(7) 
Given the expert dataset for global path planning, we can view this DCNN as a classifier with 8 classes and define the training loss in cross entropy form with norm [21] as follows
(8) 
where is the number of training samples, is the onehot vector form [22] of and is the hyperparameter adjusting the effect of norm on the loss function.
By minimizing the loss function , the optimal parameter of the DCNN can be determined as follows
(9) 
Therefore, the global path planning problem is formulated as designing and training a DCNN for value function estimation, which best fits the given expert dataset.
IiiB Proposed DBCNN for Value Function Estimation
In this subsection, we propose a novel deep neural network architecture for value funciton estimation—DBCNN, which is composed of reprocessing layers, branch one for global feature representation, and branch two for local feature representation.
IiiB1 Reprocessing Layers
The reprocessing layers comprise of two convolutional layers (Conv00, Conv01), each of which is followed by one maxpooling layer (Pool00, Pool01). The aim of reprocessing layers is to filter out noise and compress the original orbit image into feature map . After that, global path planning becomes with size instead of , the efficiency of which is improved.
IiiB2 Branch One
Branch one consists of one convolutional layer (Conv10), three residual convolutional layers (Res11, Res12, Res13), four maxpooling layers (Pool10, Pool11, Pool12, Pool13) and two fully connected layers (Fc1, Fc2). Notably, residual convolutional layer (Fig. 5) is one kind of convolutional layer proposed in [23], which not only increases the training accuracy of convolutional neural networks with deep feature representations, but also makes them generalize well to testing data. Considering that DBCNN is required to represent deep features of orbital images and achieves highprecision under unknown environments (testing images), residual convolutional layers are embedded in DBCNN. We denote the deep feature extracted from feature map by this branch as ( is dimension of feature vector ). can be viewed as a global guidance to planetary rover, which represents global features related to all pixels in orbital image and target position .
IiiB3 Branch Two
Branch two is composed of two convolutional layers (Conv20, Conv21) and four residual convolutional layers (Res21, Res22, Res23, Res24). We denote the deep feature extracted from feature map by this branch as ( is dimension of feature vector ). Since convolutional neural layers are locally connected instead of fully connected, can only extract local feature and estimate the local value function of , acting as a local guidance to planetary rovers.
The diagram of DBCNN is illustrated in Fig. 4, where Conv, Pool, Res, Fc and S are short for convolutional layer, maxpooling layer, residual convolution layer, fullyconnected layer and softmax layer respectively. Compared with VIN, not only the depth of DBCNN is reduced significantly, but also both global and local information of the image is kept and represented effectively. One typical parameter setting of DBCNN is demonstrated in TABLE I.
Reprocessing layers  Conv00  kernels with stride 1 

Pool00  kernels with stride 2  
Conv01  kernels with stride 1  
Pool01  kernels with stride 2  
Branch one  Conv10  kernels with stride 1 
Pool10  kernels with stride 2  
Res11  kernels with stride 1  
Pool11  kernels with stride 2  
Res12  kernels with stride 1  
Pool12  kernels with stride 1  
Res13  kernels with stride 1  
Pool13  kernels with stride 1  
Fc1  192 nodes  
Fc2  10 nodes  
Branch two  Conv20  kernels with stride 1 
Res21  kernels with stride 1  
Res22  kernels with stride 1  
Res23  kernels with stride 1  
Res24  kernels with stride 1  
Conv21  kernels with stride 1  
Output layers  Fc3  8 nodes 
S1  8 nodes 
IiiC Learningbased Global Path Planning Algorithm
In this subsection, we illustrate the whole learningbased global path planning algorithm based on DBCNN, which works as follows.
IiiC1 Training Phase
Since the expert dataset for global path planning is available, the training phase is offline. For each training step, we randomly choose one batch of data (line 3) and calculate the loss according to Eq. (line 4). Then, we calculate the stochastic gradient and update through gradient descent with learning rate (line 5). A training epoch ends when all batches of data are employed to train for one time (line 2). After the number of training epoch reaches the maximum, the training phase will stop (line 1).
IiiC2 Planning Phase
During the planning phase, satellite will firstly caputure the intial state including current orbital image , the start position of planetary rover , and the target position (line 1). Taking as input, DBCNN will output the estimated value function (line 3). Hence, the moving direction for planetary rover can be determined according to (line 4). After that, the position of planetary rover is changed into and the state can be updated into (line 5). By repeating this planning step until (line 2), the global path for panetary rover will be planned (as shown in the right part of Fig. 4).
IiiC3 Analysis of this Algorithm
As shown in Fig. 4, given the initial orbital image and target position of rover , DBCNN can output the estimated Q values of all positions through one forward calculation, since we can take the whole local feature map (output of layer Conv21) as the partial input of layer Fc3 directly. That is, the time and resource cost for calculating the Q value set is approximately equal to the time and resource cost for calculating a single Q value . Therefore, the planning loop (line 35) during online planning phase only requires computation at the initial step. Most significantly, when multiple rovers distributed in different places share the same destination, traditional search algorithms (e.g. A*) have to plan path for each rover one by one. By contrast, DBCNN is capable of planning paths for them simultaneously through one forward calculation, the efficiency of which is enhanced significantly.
Iv Experiments and Analysis
Iva Experimental Settings
We evaluate the planetary global path planning performace of the proposed DBCNN on two datasets as follows.
1) Grid maps with obstacles. It is composed of 10000 grid maps with size 64 64 and random obstacles, where 0 represents free grid and 1 represents obstacle. Each input consists of one grid map, one target map, and positions of the rover. Since grid maps can be viewed as simplified planetary orbital images, this dataset has been widely used to evaluate global path planning algorithms.
2) Martian surface images from HiRISE [24]. This dataset is generated from highresolution Martian images captured by real orbit detectors, which consists of 10000 images with size 128 128. Each input consists of one gray image of Martian surface, one edge image generated by Canny algorithm [25] for edge augmentation, one target image, and input positions of planetary rovers. We choose Martian surface images because they exhibit typical features among explorable planets such as craters with various sizes (as shown in Fig. 6(b)). The global path planning algorithm based on DBCNN can also be extended into other planetary scenarios.
For each dataset, the outputs of each input are the optimal moving directions of all given positions, and we randomly choose 6/7 data for training and the remaining 1/7 data for testing.
IvB Compared Baseline Architectures
We compare DBCNN with three CNN baselines as follows.
1) VIN. This is the stateoftheart deep neural network structure on path planning with fully observations. The parameter settings are the same as those in [14] and the iteration number K in VIN is set as 80.
2) ResNet. This is a classical residual network, which keeps branch two of DBCNN while deletes branch one of DBCNN. By comparing DBCNN with ResNet, we can evaluate whether branch one of DBCNN enhances the global path planning accuracy for rovers.
3) DCNN. This is a common CNN comprised of convolutional layers, maxpooling layers and fully connected layers, which is also modified from branch two of DBCNN. However, compared with ResNet, it replaces residual layers with basic convolutional layers.
The metrics we employed to evaluate their performace on global path planning tasks are global path planning accuracy and global path planning success rate, where accuracy is defined as the percentage of optimal moving direction predicted by them and success rate is defined as the percentage of safe paths planned by them.
IvC Results and Discussions
The training performance of all architectures on two datasets are shown in Fig. 7 and the final experimental results are reported in TABEL II.
IvC1 Training Results Analysis
As illustrated in Fig. 7, both training accuracy and training loss of DBCNN converge faster than other baseline CNNs. After 100 training epoches, DBCNN achieves both the higher Acc1 and the higher SR1 on all datasets, outperforming other baselines CNNs significantly (as shown in TABLE II). Moreover, compared with the stateoftheart architecture—VIN, the training time of DBCNN is largely reduced, which means that DBCNN has more efficient structure that VIN. Notably, the training time of ResNet and DCNN is smaller than DBCNN because they only keep partial structure of DBCNN. Therefore, it can be conclude that DBCNN is a more accurate and efficient architecture for planning path directly from planetary orbital images.
IvC2 Testing Results Analysis
As reported in TABLE II, DBCNN also keeps its superior global path planning performance on testing data. Remarkably, the planetary orbital images in testing data are totally different from those in training data, which demonstrates that DBCNN is capable of planning path from unknown planetary orbital images after training. Since planetary rovers are commonly required to explore unknown environments, the algorithms also need to plan path from unknown planetary environments. Hence, DBCNN is more effective for planning path from planetary orbital images in practice compared with other baselines.
Fig. 8 presents some successful path planned by DBCNN from Martian orbital images. It can be seen that the paths for rover avoid craters with varying size precisely under the guidance of DBCNN. Furthermore, the trajectories are nearly optimal. It is noteworthy that prior knowledge of craters are unknown and DBCNN has to learn and understand these deep features of original Martian images through training. Therefore, the performance of DBCNN is marvellous.
Dataset  Metrics  DBCNN  VIN  ResNet  DCNN 
Grid maps  Acc1  93.8%  80.3%  83.6%  80.3% 
Acc2  88.5%  80.8%  73.1%  76.4%  
SR1  94.7%  47.5%  40.0%  39.9%  
SR2  80.2%  49.5%  33.8%  37.5%  
ET  25.0s  56.4s  19.9s  19.2s  
Martian images  Acc1  96.5%  93.1%  87.4%  13.0% 
Acc2  96.5%  93.0%  86.1%  12.7%  
SR1  96.3%  83.7%  69.0%  1.1%  
SR2  92.3%  83.8%  67.5%  1.3%  
ET  53.4s  151.6s  41.0s  40.8s 

Acc1: global path planning accuracy on training data.

Acc2: global path planning accuracy on testing data.

SR1: global path planning successful rate on training data.

SR2: global path planning successful rate on testing data.

ET: the time cost for each training epoch.
IvC3 Model Ablation Analysis of DBCNN
To evaluate whether DBCNN could keep its performance after ablating some of its components, we compare DBCNN with ResNet and DCNN, since ResNet ablates branch one of DBCNN and DCNN replaces the residual layers on ResNet further. According to the results in TABLE II, both ResNet and DCNN perform poor on planetary global path planning tasks. Specifically, without branch one, DBCNN will lose its path planning accuracy on testing data. Moreover, without residual layers, training DBCNN will be difficult, making it almost unable to plan path from original planetary images. Therefore, it can be concluded that the double branch structure of DBCNN indeed contributes to its final performance on global path planning, and the residual layers can enhance the training efficieny of DBCNN.
Furthermore, to explain why DBCNN works well, we visualize the value function estimation results of DBCNN, VIN and ResNet (we ignore DCNN due to its poor performace). Since the final layer of these architectures will output the estimated Q value () for each input and rover’s moving direction for next step, the state value function can be derived as Eq. (5) and Eq. (6), which is illustrated in Fig. 9. It can be seen that the state value functions estimated by DBCNN are more in coincidence with the original Martian orbital images compared with VIN and ResNet. It is clear that risky areas are darker (smaller value) and the lighter locations (larger value) are around target points in state value function estimated by DBCNN. By contrast, ResNet without global deep features cannot estimate the value function as precisely as DBCNN. VIN also fails to recognize risky areas of Martian images evidently. Since the paths for planetary rover planned by these architectures follows the locations with higher value according to Eq. (5) and Eq. (6), the accuracy and successful rate of global path planning are determined by the precision of value function estimation. Therefore, from Fig. 9, we can find that DBCNN indeed works better of planetary path planning tasks than other baseline architectures.
V Conclusions
In this paper, we first propose a novel DCNN architecture with double branches—DBCNN to path path for planetary rovers directly from orbital images, which requires no prior knowledge about the planetary orbital images. Then, we present the complete global path planning algorithm based on DBCNN. Moreover, through comparison experiments on two global path planning datasets, we demonstrate that DBCNN achieves higher precision and efficiency on global path planning tasks compared with the existing best architecture—VIN. Finally, we analyze why DBCNN works well through model ablation analysis and visualization analysis. In future research, more effective deep neural network architecture will be explored and the robustness of the architecture will be researched further.
Vi Acknowledgement
This work was supported by the National Key Research and Development Program of China under Grant 2018YFB1003700, the Beijing Natural Science Foundation under Grant 4161001, the National Natural Science Foundation Projects of International Cooperation and Exchanges under Grant 61720106010, and by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China under Grant 61621063.
References
 [1] M, Sutoh ., “The right path: comprehensive path planning for lunar exploration rovers.” IEEE Robotics & Automation Magazine, vol. 22, no. 1, pp. 2223, 2015.
 [2] P. Raja, S. Pugazhenthi, “Optimal path planning of mobile robots: A review.” International Journal of Physical Sciences, vol. 7, no. 9, pp. 13141320, 2012.
 [3] T. LozanoPÃ©rez, M. A. Wesley, “An algorithm for planning collisionfree paths among polyhedral obstacles.” Communications of the ACM, vol. 22, no. 10, pp. 560570, 1979.
 [4] T. LozanoPerez, “Spatial planning: A configuration approach.”, IEEE Transactions on Computers, vol. 32, no. 3, pp. 108120, 1983.
 [5] Q. Guo, Z. Zhang, Y. Xu, “Pathplanning of automated guided vehicle based on improved Dijkstra algorithm.” Chinese Control and Decision Conference, pp. 71387143, 2017.
 [6] J. Wang ., “Route planning based on Floyd algorithm for intelligence transportation system.” IEEE International Conference on Integration Technology, pp. 544546, 2007.
 [7] C.H. Chiang ., “A comparative study of implementing Fast Marching Method and A* search for mobile robot path planning in grid environment: Effect of map resolution.” IEEE Workshop on Advanced Robotics and Its Social Impacts, pp. 16, 2007.
 [8] D. Ferguson, A. Stentz, “Using interpolation to improve path planning: The Field D* algorithm”, Journal of Field Robotics, vol. 23, no. 2, pp. 79101, 2006.
 [9] C. Zeng, Q. Zhang, X. Wei, “Robotic global pathplanning based modified genetic algorithm and A* algorithm.” International Conference on Measuring Technology and Mechatronics Automation, pp. 167170, 2011.
 [10] HI. Kang, B. Lee, K. Kim, “Path planning algorithm using the particle swarm optimization and the improved Dijkstra algorithm.” Workshop on Computational Intelligence and Industrial Application, vol. 17, no. 4, pp.10021004, 2009.
 [11] M. Brand ., “Ant colony optimization algorithm for robot path planning.” International Conference On Computer Design and Applications, vol. 3, pp. 436440, 2010.
 [12] Y. Bassil, “Neural network model for pathplanning of robotic rover systems’, International Journal of Science and Technology, vol. 2, no.2, pp. 94100, 2012.
 [13] J. Gu ., “Recent advances in convolutional neural networks.” arXiv preprint arXiv:1512.07108, 2015.
 [14] A. Tamar ., “Value Iteration Networks.” In Advances in Neural Information Processing Systems, pp. 21462154, 2016.
 [15] A. Khan, ., “Memory augmented control networks.” International Conference on Learning Representations, 2018.
 [16] E. Parisotto, R. Salakhutdinov, “Neural Map: sturctured memory for deep reinforcement learning.” International Conference on Learning Representations, 2018.
 [17] N. Nardelli ., “Value Propagation Networks.” Workshops on International Conference on Learning Representations, 2018.
 [18] R. S. Sutton, A. G. Barto, “Reinforcement learning: An introduction.” MIT Press, 1998.
 [19] A. Attia, S. Dayan, “Global overview of Imitation Learning.” arXiv preprint arXiv:1801.06503, 2018.
 [20] Y. Li, “Deep reinforcement learning: An overview.” arXiv preprint arXiv:1701.07274, 2017.
 [21] I. Goodfellow , “Deep Learning.” MIT Press, 2016.
 [22] D. Harris , “Digital design and computer architecture.” Chian Machine Press, pp. 770778, 2014.
 [23] K. He , “Deep residual learning for image recognition.” IEEE Conference on Computer Vision and Pattern Recognition, pp. 770778, 2016.
 [24] S. A. McEwen , “Mars Reconnaissance Orbiter’s High Resolution Imaging Science Experiment (HiRISE).” Journal of Geophysical Research Planets, vol. 112, no. E05S02, pp. 140, 2007.
 [25] J. Canny, “A Computational Approach To Edge Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679698, 1986.