Integrated Motion Planner for Real-time Aerial Videography with a Drone in a Dense Environment

Integrated Motion Planner for Real-time Aerial Videography
with a Drone in a Dense Environment

Boseong Felipe Jeon and H. Jin Kim *This material is based upon work supported by the Ministry of Trade, Industry & Energy(MOTIE, Korea) under Industrial Technology Innovation Program. No.10067206, ’Development of Disaster Response Robot System for Lifesaving and Supporting Fire Fighters at Complex Disaster Environment’Department of mechanical and aerospace engineering, Seoul national university of South Korea {a4tiv,hjinkim}

This letter suggests an integrated approach for a drone (or multirotor) to perform an autonomous videography task in a 3-D obstacle environment by following a moving object. The proposed system includes 1) a target motion prediction module which can be applied to dense environments and 2) a hierarchical chasing planner based on a proposed metric for visibility. In the prediction module, we minimize observation error given that the target object itself does not collide with obstacles. The estimated future trajectory of target is obtained by covariant optimization. The other module, chasing planner, is in a bi-level structure composed of preplanner and smooth planner. In the first phase, we leverage a graph-search method to preplan a chasing corridor which incorporates safety and visibility of target during a time window. In the subsequent phase, we generate a smooth and dynamically feasible path within the corridor using quadratic programming (QP). We validate our approach with multiple complex scenarios and actual experiments. The source code can be found in

I Introduction

Video filming has been one of the most popular applications of unmanned aerial vehicles equipped with vision sensors, utilizing their maneuverability and improvement in the technologies such visual odometry [qin2018vins] and mapping [hornung2013octomap, oleynikova2018safe]. For example, drones have been employed in various cinematographic tasks from personal usage to broadcasting sport events, and corresponding research has received great interests in the recent decade [nageli2017real, penin2018vision, bonatti2018autonomous]. Still, the automation of the videographic tasks using drones remains as an open challenge especially in general dense environments.

This letter addresses an online motion strategy developed for more realistic situations where multiple obstacles have arbitrary shapes and the future trajectory of target is not exactly known a priori to the filming drone except the location of sparse via-points which are pre-selected for filming purposes. Also, we do not assume that the arrival time at each point is known to the drone. For example, a drone can be deployed for the cases such as shooting a ski game or racing where players are supposed to pass defined spots in a track. As another common example, we can consider an event where an important person (or actor) passes through defined locations in a crowded place and the arrival times for the spots are not exactly predetermined.

Fig. 1: Top: Autonomous aerial video shooting using a drone fora moving target in plane classroom with cluttering objects. The drone plans a chasing trajectory on-the-fly to incorporate safety and visibility of target against obstacle. The target (UGV with a green marker) is driven manually by a human operator. Bottom: Path history of the target (black) and chaser(magenta). The history of line-of-sight (LOS) of the drone toward the target is visualized with sky-blue arrows.

I-a Technical challenges

In our problem, the followings can be pointed out as main challenges, which should be handled jointly.

I-A1 Smooth transition

first of all, the smoothness of flight path of a drone is essential for flight efficiency avoiding jerky motion, which could cause increased actuation inputs and undesirable shooting quality.

I-A2 Flight safety

the recording agent should be able to maintain its safety against arbitrary shape of obstacles not only simple obstacles (e.g. ellipse or sphere) for the broad applicability.

I-A3 Occlusion against obstacles of general shape

Occlusion should be carefully handled in obstacle environments. It could degrade the aesthetic quality of video, which could be one of the top priorities in cinematographic tasks. More practically, a duration of occlusion of a dynamic object might interrupt the autonomous mission if the cinematographer drone fails to re-detect the target object after losing the target out of the field of view.

I-A4 Trade-off between global optimality and fast computation

As described in 1)-3), a motion strategy for videography in the considered cases aims to achieve multiple objectives simultaneously. Such a multi-objective problem is subject to local minima and might yield a poor solution if it relies entirely on numerical optimization. On the contrary, if one relies only on sampling or discrete search algorithm such as RRT* [karaman2011sampling] and A*[duchovn2014path] to focus on the global optimality at the cost of online computation, a drone might not be able to respond fast enough for the uncertain motion of target on-the-fly. Therefore, trade-off between optimality and fast computation should be taken into account in a balanced manner.

I-A5 Target prediction considering obstacles

for the operation in obstacle environments based on incomplete information of target trajectories, another challenge is a reliable motion prediction of a dynamic object with consideration of obstacles. For example, if a chaser makes infeasible prediction without properly reflecting obstacle locations, the planning based on wrong prediction will also become unreliable. Thus, accounting for obstacles is crucial to enhance the chasing performance in our scenario.

I-B Related works

The previous works [nageli2017real], [penin2018vision] and [bonatti2018autonomous] addressed the similar target following problem with consideration of flight efficiency, safety and visibility (A1-A3) under continuous optimization formulation to deal with A1-A4 their problem settings. [nageli2017real] and [penin2018vision] developed a receding horizon motion planner to yield dynamically feasible path in real-time for dynamic situations. They assume an ellipsoidal shape for obstacles, which is not applicable to more general cases, having difficulty in fully satisfying A2 and A3. Also, the objective function contain multiple non-convex terms such as trigonometry and product of vectors. This formulation might not be able to produce a satisfactory solution in short time due to local-minima as discussed in A4.

In [bonatti2018autonomous], occlusion and collision was handled in a general environment which is represented with octomap. Nevertheless, they relied on numerical optimization of the entire objectives containing complex terms such as integration of signed distance field over a manifold. Such approach might not be able to guarantee the satisfactory optimality, similar to the case of [nageli2017real, penin2017vision] (A4). In [chen2016tracking], the authors addressed the target following with consideration of A1,A2,A4 and A5. The authors designed a hierarchical planner to consider the trade-off between optimality and online computation where a corridor is pre-planned to ensure safety and then a smooth path is generated to minimize the high-order derivatives in the following phase. Especially, [chen2016tracking] performs prediction of target movement with a polynomial regression over past observations. Still, the prediction did not consider obstacles (A5) and the occlusion of the target was not included in their multi-layer planner, having difficulty in handling A3.

Regarding the target prediction in target following tasks, [vsvec2014target] included the obstacles for the formulation of prediction directly tackling A5. The authors performed Monte-Carlo sampling to estimate the distribution of future target position. In the paper, however, the dynamics and set of possible inputs of target were assumed to be known as a prior, which is difficult to directly applied to general cinematic scenario. Also, the author restricted the homotopy of the solution path of the robot assuming discrete selection of actuation input. This method might not be able to achieve enough travel efficiency as pointed out in A1.

To the best of our knowledge, there is only few research which effectively handle A1 to A5 simultaneously for drones to be employed in the considered cinematic or chasing scenarios. In this letter, we make the following contributions as extension of our previous work [jeon2019online]

  • An integrated framework for motion strategy is proposed from prediction module to chasing planner, which could achieve desired performance mentioned A1-A5 in our cinematic problem setting. Especially, we validate the newly developed prediction module by examining its effectiveness for the proposed motion planner.

  • We validate our method by multiple challenging scenario and real world experiment. Especially, the tested real platform is implemented to operate fully onboard handling target detection, localization, motion planning and control.

The remainder of this paper is structured as follows: we first describe problem proposition and overall approach. In the subsequent section, method for target prediction for a future time window is proposed in section III, which is followed by a hierarchical chasing planner design  in section IV and V.

Fig. 2: A diagram for system architecture: we suppose the camera-drone has prior knowledge of the environment and target color for detection. Based on them, we implement a fully onboard system for automatically following a moving target with drone. For state estimator of drone, we utilize the ZED internal visual odometry and pixhawk is used for flight control. Target future motion prediction and chasing planner are proposed in this letter.

Ii Overview

Here we outline and formalize the given information and the desired capabilities of the proposed method. We assume that the filming scene is available in the form of octomap before filming. It is also assumed that the actor will pass a set of via-points in sequence for a filming purpose. The viapoints are known a priori while the arrival time for each point is not available. As an additional specification on the behavior of target, it is assumed to move along a trajectory by minimizing high-order derivatives such as acceleration as assumed in [chen2016tracking]. Additionally, we assume that the target object is visible from a drone at the start of the mission within the limited field-of-view (FOV) of the drone.

Based on these settings, we focus on a chasing planner and target prediction module which can handle A1-A5 simultaneously as mentioned in section I. Additionally, the chasing strategy optimizes the total travel distance and the efforts to maintain a desired relative distance between the drone and object.

Iii Target future trajectory prediction

This section describes target future motion estimation utilizing the observation history and prior map information. Here the terms path and trajectory are differentiated for clarity as described in [gasparetto2015path]. A path refers to a geometric path while trajectory is time-parameterized path. The proposed prediction module generates a geometric prediction path first, which will be followed by time estimation for the each point on the path.

Fig. 3: Prediction of target motion over a horizon .

Iii-a Target path prediction

As mentioned in section II, we assume that the sequence of via-points of the target is available as which is supposed to be passed in order and the arrival time for each point is not preset. Also, let us denote the value of the Euclidean signed distance field (ESDF) at a position of the prior map as . We denote the position of object at time as .

Now, let us assume that the drone has gathered target observation at discrete time steps and write as . Additionally, let us consider a situation where the target heads to after passing the previous waypoint. For a prediction callback time and a future horizon , we want to forecast the future target trajectory with the estimated trajectory . To obtain , a positional path where is generated first to provide a geometric prediction until the point where the target reaches by solving the following optimization.


where in the first term is a positive constant for weighting more to error of recent observation. The second term implies the assumption that the target will minimize its derivatives for the sake of its actuation efficiency. The function in the last term is a non-convex cost function to reflect the safe behavior assumption of target (see [ratliff2009chomp] for more details of the functional), which is computed based on the of the prior map information. eq. 1 can be arranged into the below, which is the standard form for covariant optimization [ratliff2009chomp].


(2) is solved with the following covariant update rule where is a step size.


From (1) – (3), a geometric path of target is predicted using until (see fig. 3). In the following step, the path is endowed with time to complete prediction.

Iii-B Time prediction

In this subsection, we will allocate time knots for each point in with the constant velocity assumption for the simplicity. For the points which was used to regress on the past history of target, we simply assign the observation time stamps . For the predicted position (), the following recursion is used for allocating times.


where  represents the average speed during the collected observation. The passing times for the points obtained in eq. 1 are estimated with the constant velocity assumption based on . With this allocation, the future trajectory of target for a time window is predicted with the following interpolation:


In our videography setting which will be introduced in Sec. VI, single prediction optimization routine runs at 30-50 Hz showing its real-time performance. This helps us to re-trigger prediction when the estimation error exceeds a threshold on-the-fly.

Iv Preplanning for chasing corridor

This section introduces a method for generating a chasing corridor in order to provide the boundary region for the chasing drone’s trajectory. We first explain a metric to encode safety and visibility of the chaser’s position, which is utilized as objectives in computing the corridor. Then, the corridor generation by means of graph-search is described. Several notations are defined adding to as follows:

  • : Position of a chaser (drone).

  • : Position of a target.

  • : The line segment connecting .

  • : Configuration space.

  • : Free space in , i.e. the set of points where the probability of occupancy obtained from octomap is small enough.

  • : Space occupied by obstacles.

  • : A set of visible vantage points for a target position .

  • : A set of occluded vantage points for a target position .

Fig. 4: Prediction result in the complex city cinematic scenario (the detailed settings can be found in VI). In the figure, black dots denote past observation for actor in the buffer and green dot for target via-points sequence . Blue points and Magenta line denote geometric path while thick blue line means trajectory estimation over horizon . Also, the cost function based on ESDF is illustrated in jet-colormap.

Iv-a Metric for safety and visibility

For the safe flight of the camera-drone , we reuse ESDF as it can measure the risk of collision with nearby obstacles. Here, is used as a constraint in graph construction so that drone can maintain a safe clearance during entire planning horizon. Now, the visibility metric is introduced so that we can encode how robustly the drone can maintain its sight against occluding obstacles and unexpected motion of the target in the near future. For a target position when seen from a chaser position with line of sight (LOS) , we define the below as visibility:


In the actual implementation, (6) is calculated over the grid field. That is, we can evaluate (6) with a simple min operation while iterating through voxels along with the linear time complexity. Because (6) means the minimum distance between obstacle and LOS connecting the object and drone, its small value implies that the target could be lost more easily than the higher value as illustrated in fig. 5 - (b). The proposed metric possesses multiple advantages. First, it can be directly computed from reusing ESDF which was utilized for the target prediction and safety constraint for drone, without further complex calculation. Second, it can be defined without restriction of shape of obstacle in contrast to the research such as [nageli2017real], [penin2018vision] and [bonatti2018autonomous]. Detailed explanation on the advantage and properties of the proposed metric is referred to our previous work [jeon2019online].

Fig. 5: (a): safety metric and visibility metric for a target position and drone . (b): Visibility field for in colormap. Red denotes higher visibility and the occluded region is illustrated with the same uniform color (dark blue) for simplicity. As an illustrated example, we consider the case where the object moves for a short time. Both positions and are able to see . While the camera-drone at can still observe the target at , it fails to maintain the visibility at .

Iv-B Corridor generation

Based on the proposed metric for safety and visibility, computation of the sequence of corridors for chasing is explained here. Before that, we plan a sequence of viewpoints for a time horizon as a skeleton for it. Let us assume that the drone is at and target prediction . For a window , time is discretized as and we rewrite , . Here, the sequence of viewpoints is generated where point is selected from a set . denotes a discrete point in a given grid. and are the minimum and maximum distance of tracking. The discrete path is obtained from the following discrete optimization.

subject to

The objective function in (7) penalizes the interval distance between each point and rewards the high score of visibility along path. The second term is defined by


The last term in (7) aims to keep the relative distance between drone and object as . is the weight for visibility and is for relative distance. Among the constraints, the second one enforces a safe clearance of each line and the third constraint means that should be a visible viewpoint for the predicted target at . The last constraint bounds the maximally connectable distance between two points , in subsequent steps. More details on the method for building a directional graph to solve the above discrete optimization is explained in our previous research[jeon2019online] and fig. 6-(a).

From computed from (7), we generate a set of corridors connecting two consecutive viewpoints in as visualized in Fig 6-(b). Once the width of corridor is chosen, we can write the box region connecting and as a linear inequality . The corridor is depicted with red rectangles in fig. 6-(b). Due to the formulation of (7), every point in maintains a safe margin and from the viewpoints the predicted point can be observed without occlusion by obstacles. Also, for a large value of and small enough , we empirically found that every point in corridor can maintain visibility for the prediction for .

V Smooth path generation

Fig. 6: (a): Illustrative example for graph construction. The blue line denotes target prediction for a time window. Red dots denote the elements in at each time step. We connect two nodes of two consecutive sets as a directional edge if the second and fourth constraints in (7) hold. Top: chasing corridor based on the skeleton . The red edged boxes denote corridors between and with width . (c): smooth path (red) is generated within the corridors from (b). LOS stamps are also drawn with black arrows.

In the previous section, the procedure to select viewpoints and corridor was proposed, which was computed by optimally considering visibility and travel distance while ensuring safety. In this section, we generate a dynamically feasible trajectory for position and yaw using and . The position trajectory is represented with piece-wise polynomials as below:


Where is coefficient and denotes the order of the polynomial. Polynomial coefficients of the chaser’s trajectory are computed from the optimization below (10). The planning of yaw was done so that heads toward at each time step if observation of the target at is acquired.

subject to

Our optimization setup tries to minimize the magnitude of jerk along the trajectory and the deviation of from viewpoints where is an importance weight. In the constraints, and is the state of drone when the planning was triggered and used as the initial condition of the optimization. Additionally, we enforce continuity conditions on the knots. The last constraint acts as a box constraint so that the smooth path is generated within the chasing corridors for the purpose of safety and visibility. As investigated in [mellinger2011minimum], can be executed by the virtue of differential flatness of quadrotors. (10) can be solved efficiently with the algorithm such interior point [mehrotra1992implementation]. The overall algorithm is summarized in Algorithm 1. During mission, we predict target future trajectory for a time window with several recent observation by solving (1). If observation becomes unreliable where the accumulated estimation error exceeds a defined threshold, prediction is re-triggered. Based on observation, the chasing planner yields a desired trajectory for the chaser by pre-planning and generating a smooth path to be executed during a corresponding horizon. This loop continues until the end of the videographic mission.

Input : SDF , target via-points , receding horizon window , time discretization
Initialize : 
1 for  to  do
2       // from mission start to finish
3       forall  do
4             observ.append()
5             if  then
6                   = Predict(observ, )
7                   = Planning(, )
8                   //
9             end if
11             // accumulate estimation error
12             if  then
13                   = .next // if target is observed to reach , update
14             end if
16       end forall
18 end for
Algorithm 1 Receding horizon chasing planner

Vi Results

Vi-a Simulations

We validated the proposed algorithm in a dense environment with multiple target trajectories. For simulation, we used complex city (see our previous work [jeon2019online] for the 3D models) where five target via-points are defined as green circles as in fig. 8. Complex city includes multiple non-convex obstacles, and the target was operated to hide behind the obstacles at the moment denoted as orange boxes (see fig. 8). In the simulation, we used rotors simulator [furrer2016rotors] for the chaser drone, and the target (turtlebot) was manually operated with a keyboard. A vision sensor is fixed on the drone (13°pitching down). Due to this, the elevation of LOS was limited when selecting the elements from . All the virtual platforms operated in gazebo environment and the simulations were performed in Intel i7 CPU and 16GB RAM laptop. Boost Graph Library (GPL) was used for preplanning while qpOASES [ferreau2014qpoases] was used to solve quadratic programming for the smooth planning phase. For the four target trajectories, chasing strategies with two different levels of visibility weights are tested, totalling 8 simulations. Other than the visibility weight , the other parameters were set at the same value for all tests. For the simulation, we directly fed the current position of the turtlebot to the drone. The results are summarized in fig. 8 (A)-(D) and table I. For each target scenario, the history of is plotted in the bottom row in fig. 8 where a small value of implies difficulty for securing visibility due to the proximity of the target to obstacles. Planner for high visibility with tries to secure more visibility score compared to planner with . Specifically, the value of of the planner with was on average 24% higher than the case with . Also, the duration of occlusion was 42% lower with in the four target trajectory cases. In contrast, the planner for low visibility with decreased the travel distance to 34% on average compared to . In all simulations, the safety of drone chaser was strictly satisfied during entire mission. The average computation times are summarized in fig. 9. We observed that the entire pipeline of the receding horizon planner ran at 5-6Hz, showing the capability to re-plan fast enough in response to the unexpected target motion.

Fig. 7: The camera-drone for onboard implementation: for algorithm execution, intel NUC is used as core, and camera- and vision-related tasks such as visual odometry and target localization run on Jetson TX2. Pixhawk is employed as the flight controller.
Fig. 8: flight result for the four different trajectories of the target. Top: the target’s history is denoted as a black line and the chaser’s flight histories are depicted with skyblue for low visibility, and magenta for high visibility respectively. The size of the grid in the figure is 4[m]. From (A) to (D), the target moves faster and it is more unpredictable due to its hiding behind obstacles, which increases difficulty. For all the simulations, the target passes through five via-points one-by-one (green circles). The orange boxes denote the locations where the target was intentionally operated to hide behind obstacles with an abrupt maneuver. To confirm the smoothness of the flight trajectory, a zoom-in image is provided in (D). Bottom: history of distance field value of the target and visibility score for a small () and a large () are plotted. The dotted vertical line (green) denotes the the target’s arrival time to each via-point.
Fig. 9: Average computation time in each simulation scenario for three phases of prediction, preplanning and smooth planning.

max width=0.48

target speed [m/s] 0.36 0.57 0.67 0.75
1.0 5.0 1.0 5.0 1.0 5.0 1.0 5.0
avg. [m] 0.6084 0.8433 0.4817 0.5330 0.5543 0.6051 0.5566 0.7791
occ. duration [sec] 0.099 0 7.59 4.323 4.29 2.145 9.768 6.435
flight dist.[m] 40.7761 49.9106 34.8218 47.5424 36.3040 50.6000 55.7393 77.2377
TABLE I: Simulation result

max width=0.48

Type Name Value
Common time window [s]
Prediction obsrv. temporal weight
weight on prior term
obsrv. pnts./pred. pnts /
pred. accum. err. tol.[m] 1.0

tracking distance weight
desired tracking dist.[m]
maximum connection[m]
lower and upper bounds of relative dist. [m] /
time step
safe margin[m]

Smooth planning
waypoint weight
polynomial order

safe tol.[m]
tracking elev.
TABLE II: Common parameters for simulations

Vi-B Real world experiment

We tested the proposed method in an actual experiment in an indoor classroom without GPS. In the test, the drone is equipped with ZED (stereo vision sensor) and pixhawk2 auto-pilot for flight control. For visual odometry (VO), we used ZEDfu (the internal VO algorithm of ZED). The vision-related algorithm ran on Jetson TX2 while planning and control was processed in the onboard computer (NUC7i7BNH) (see fig. 7). The target is turtlebot waffle PI and a disk with green color was attached on top of the turtlebot to simplify the detection from the drone. The target was operated manually by a human operator with linear velocity of 0.2-0.3 m/s. In order to obtain the position of the target, we thresholded HSV (hue, saturation, value) color to segment the target into an ellipsoid. We extracted the pointcloud of the center of the ellipsoid, to which we applied a smoothing filter to finalize the target’s position. The experimental environment and the path taken by the target and chaser are shown in fig. 1 with stamps of bearing vector. The whole pipeline runs at 10 Hz in our parameter settings for the experiment. In the experiment, we set m, m and m. The grid size used for octomap and the computation of visibility score is 0.1m. was set to the visibility weight. The entire path history for the drone and the target is plotted in fig. 1. The result of each planning trigger can be found in fig. 10.

Fig. 10: Top: illustration of the chasing algorithm and the experiment. The localized target and drone are denoted as red and blue dotted boxes respectively. The target’s future trajectory for 4 seconds is predicted as shown in the red line. The elements of for each time step are depicted with their visibility score (red color denotes higher score). The drone plans its chasing path over a short horizon based on the preplanned points (dark blue sphere). Bottom: the snapshots of the drone flight for chasing a target, and camera view of the drone. The target is detected with an enclosed ellipse in the view. The entire path taken by target and drone is visualized in fig. 1.

Vii Conclusion and future works

In this letter, we proposed a chasing planner to handle safety and occlusion against obstacles. The preplanning phase provides a chasing corridor where the objectives such as visibility, safety and travel distance are optimally incorporated. In the smooth planing, a dynamically feasible path is generated based on the corridor. We also proposed a prediction module which allows the camera-drone to forecast the future motion during a time horizon, which can be applied in obstacle cases. The whole pipeline was validated in various simulation scenario, and we implemented real drone which operates fully onboard to perform autonomous videography. We also explored the effect of visibility weights to the two conflicting objectives: travel distance and visibility. From the validations, we found that the chaser was able to handle multiple hiding behavior of target effectively by optimizing the visibility. In the future, we will extend the proposed algorithm for the multi-target chasing scenario. Also, we plan to enhance the algorithm for the case of unknown map where the drone has to explore to gather information to generate more efficient trajectory.


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description