Active Perception based Formation Control for Multiple Aerial Vehicles

Active Perception based Formation Control for Multiple Aerial Vehicles

Rahul Tallamraju, Eric Price, Roman Ludwig, Kamalakar Karlapalem,
Heinrich H. Bülthoff, Michael J. Black and Aamir Ahmad
Max Planck Institute for Intelligent Systems, Tübingen, Germany. rahul.tallamraju, eprice, roman.ludwig, black, aamir.ahmad@tuebingen.mpg.deMax Planck Institute for Biological Cybernetics, Tübingen, Germany. hhb@tuebingen.mpg.deAgents and Applied Robotics Group, IIIT Hyderabad, India. Accepted to IEEE Robotics and Automation Letters 2019. DOI: 10.1109/LRA.2019.2932570The arxiv manuscript has an additional figure, Figure 5, which has been cited in the published RA-L manuscript.

We present a novel robotic front-end for autonomous aerial motion-capture (mocap) in outdoor environments. In previous work, we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple micro-aerial vehicles (MAVs). However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person’s cooperatively tracked 3D position estimate. In this article, we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking into a quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive this decoupling using Gaussian observation model assumptions within the CDT algorithm. We preserve convexity in optimization by embedding all the non-convex constraints, including those for dynamic obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented.

Visual Tracking; Aerial Systems: Perception and Autonomy; Multi-Robot Systems

I Introduction

Aerial motion capture (mocap) of humans in unstructured outdoor scenarios is a challenging and important problem [eprice, huang2018act, zhou2018human, xu2018flycap, nageli2018flycon]. It may directly facilitate applications like sports medicine, cinematography, or search and rescue operations. The front-end of our mocap system consists of a team of micro aerial vehicles (MAVs), autonomously detecting, tracking and following a person. The front-end is responsible for the online task of the system, which is to continuously estimate the 3D global position of the person and keep him/her centered in the field of view of their on-board camera, while he/she performs activities such as walking, running, jumping, etc. The online task is the core focus of this article. The offline task, which is not addressed in this article, is to estimate full body skeleton pose and shape using images acquired by the MAVs.

Fig. 1: Multi-exposure image of three octocopter MAVs cooperatively tracking a fast walking person while maintaining a perception-driven formation using the approach proposed in this paper. The person walks from right to left. The initial positions of the MAVs in their trajectories are circled.

In order to develop the mocap front-end, in our previous work [eprice], we presented a marker-less deep neural network (DNN) based cooperative detection and tracking (CDT) algorithm. That work had two major shortcomings. First, the motion planner on each MAV did not facilitate uncertainty minimization of the person’s 3D position estimate. Second, the local motion plans generated by the MAVs using a model-predictive controller (MPC) were subsequently modified using potential fields for collision avoidance. This further led to sub-optimal trajectories and robots getting easily stuck in field local minima, e.g., stuck behind each other or behind closely spaced obstacles.

In the current work, we solve both of the aforementioned problems through a novel convex decentralized formation controller based on MPC. This new MPC actively minimizes the joint uncertainty in the tracked person’s 3D position estimate while following him/her. The key novelties are as follows.

  • [leftmargin=*]

  • The first novel idea, which enables us to formulate this problem as a locally convex MPC, is the decoupling of the joint uncertainty minimization into i) a convex quadratic objective that maintains a threshold distance to the tracked person, and ii) constraints that enforce angular configurations of the MAVs with respect to (w.r.t.) the person. We derive this decoupling based on Gaussian observation model assumptions used within the CDT algorithm.

  • To guarantee the safety of the motion plans, we incorporate collision avoidance constraints w.r.t. i) other MAVs, ii) the tracked person and iii) static obstacles, only as locally convex constraints. Collision avoidance and angular configuration constraints are inherently non-convex. We preserve convexity in our MPC formulation by converting them to external control input terms embedded inside the MPC dynamics, which are explicitly computed at every iteration of the MPC.

Fig. 2: Formation of MAVs with the mathematical notations used in this paper.

Real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented. Figure 1 shows a multi-exposure image of a short sequence from one of these experiments. Simulation results with up to 16 robots demonstrate the robustness of our approach and scalability w.r.t. to the number of robots. We provide open-source ROS-based111 source code of our method.

Ii State-of-the-Art

Multi-robot active tracking: In [morbidi2011active], multi-robot trajectories are analytically derived and uncertainty bounds are identified for optimizing the fused estimate of the target position. In [zhou2011multirobot], a non-convex optimization problem is solved analytically, to minimize the trace of the fused target covariance. However, their approach is centralized and uses Gauss-Seidel relaxation to compute optimal solutions for multiple robots. Centralized non-convex optimization [hausman2015cooperative, hausman2016occlusion] is used to track a target in stochastic environments and locally minimize the fused estimate of the target position. In [lima2015formation], active perception based formation control is addressed using a decentralized non-linear MPC. However, their method only identifies sub-optimal control inputs due to the non-convexity of optimization and the lack of collision avoidance guarantees. In [falanga2018pampc], a perception-aware MPC generates real-time motion plans which maximize the visibility of a desired target. The motion plans are, however, generated only for a single aerial robot. A marker-based multi-robot aerial motion capture system is presented in [nageli2018flycon], where the formation control is achieved using a decentralized non-linear MPC. Scalability in the number of robots and formation collision avoidance behaviors are not explicitly addressed in their approach.

Obstacle Avoidance: Sequential convex programming is applied to solve centralized multi-robot collision-free trajectory generation [augugliaro2012generation]. Due to non-convexities which arise from obstacle avoidance constraints, the obtained solutions are locally optimal albeit fast for a given time horizon. In [lima2015formation] repulsive potential field functions are employed as one of the optimization objectives to avoid collisions. This is highly limiting as obstacle avoidance cannot be guaranteed. Recent work in distributed multi-agent obstacle avoidance [alonso2015collision] convexifies the reciprocal velocity obstacle constraint, to characterize velocities that do not lead to a collision in a perfectly localized environment. Motivated by multi-view cinematography applications, distributed non-linear model predictive control [nageli2017] is used to identify locally optimal motion plans for aerial vehicles. In one of our previous works [rahult], we developed a convex optimization program to generate local collision-free motion plans, while tracking a movable pick and place static target using multiple aerial vehicles. This approach generates fast, feasible motion plans and has a linear computational complexity in the number of environmental obstacles. The work was validated only in a simulation environment and, moreover, obstacle avoidance was not guaranteed. In our current work, we consider real robots, stochasticity in the environment and bounds on repulsive potential fields to guarantee collision avoidance and generate safe local motion plans.

Iii Proposed Approach

Iii-a Preliminaries and Problem Statement

Let there be K MAVs tracking a person P. The pose (position and orientation) of the MAV in the world frame at time is given by . The MAVs pose uncertainty covariance matrix is denoted as . Each MAV has an on-board, monocular, perspective camera. It is important to note that the camera is rigidly attached to the MAV’s body frame, pitched down at an angle of . Hence, the pose of the camera in the MAV’s frame is (see Fig. 2). The observation measurement of the person ’s position made by MAV in its camera frame is given by a mean range , bearing and inclination (see Fig. 2) in spherical coordinates. These are associated with a zero mean measurement noise, denoted by , and respectively. We assume that these measurements are uncorrelated. In camera frame’s Cartesian coordinates these measurements are denoted as and respectively. The fused estimate of the person ’s position and uncertainty covariance in the world frame are denoted as and . These estimates are computed by each MAV fusing its own measurements, , with the measurements received from all other teammates ( ).

The MAVs operate in an environment with known static obstacles (our approach is agnostic to how these obstacles are detected) and neighboring MAVs as dynamic obstacles. The goal of each MAV is to cooperatively track the person using a replicated instance of the proposed formation control algorithm. This involves (a) minimizing the MAV’s fused estimate of the measurement uncertainty covariance and, (b) avoiding environmental obstacles.

Fig. 3: Our aerial mocap front-end architecture. This paper focuses on the blue module.

Iii-B Cooperative Detection and Tracking

Fig. 3 describes our mocap front-end architecture including i) the decentralized formation controller and ii) cooperative detection and tracking (CDT) modules. In [eprice], the focus was on developing the CDT algorithm (green block in Fig. 3). The work in the current paper focuses on developing the formation controller (blue block in Fig. 3), which utilizes information from CDT and other modules. In CDT, each MAV runs its own instance of an EKF for fusing detections made by it and all its teammates. This results in a replicated state estimate of the person at each MAV, which is used to predict a region of interest (ROI) in future camera frames. The ROI guides the yaw of each MAV thereby ensuring that the tracked person is in the MAV’s field of view. In Sec. III-D, we use these state estimates to drive our decentralized formation controller. At every time step, the controller of MAV generates way-point positions and velocities for itself using (a) estimated state of the tracked person, (b) horizon motion plans communicated by teammates and (c) the positions of obstacles. The way-points guide MAV ’s low-level position and yaw controller.

Fig. 4: (a), (b) represent arbitrary configurations with large joint uncertainties. (c) is the optimal configuration for 3 MAVs used in this work.

Iii-C Measurement Models and Joint Uncertainty Minimization

Iii-C1 Measurement Model for a Single MAV

For a MAV in this sub-section we drop the superscripts and subscripts. In a supplementary multimedia file attached with this paper, we empirically show that (a) the variance of the noise in the range measurement evolves quadratically with distance to the person, i.e., , and (b) the variance of the noise in both bearing and inclination measurements are approximately constant w.r.t. the distance to the person, which we represent as and . and are positive constants specific to the system.

We now convert the measurement noise variances from spherical to Cartesian coordinates. When using converted measurements (especially when measurement noise covariance is explicitly required) and performing estimation in Cartesian coordinates, most previous works, e.g., [zhao_best_linear], [Rong_Li_survey_measurments_part_III], consider the measurement-conditioned conversion approach as derived in [l_mo_bar_shalom_unbiased_converted] or [miller_coordinate_transform] as one of the best approximations. Using this conversion from [l_mo_bar_shalom_unbiased_converted] or [miller_coordinate_transform] the noise covariances in Cartesian coordinates, i.e., the elements of can be written as


where and are numerical constants, are exponential functions of , are trigonometric functions of , and are functions containing trigonometric and exponential expressions involving and , respectively. Exact functions can be obtained from equations (10) of[l_mo_bar_shalom_unbiased_converted] or pp. , of [miller_coordinate_transform].

Iii-C2 Assumptions in Our Scenario

Restoring the superscripts and subscripts for the spherical and Cartesian coordinate person measurements, we make the following assumptions on the MAV observation model.

Bearing Measurement: We employ a separate yaw controller [eprice] to guide the yaw angle of MAVs towards the tracked person. This is to ensure that the MAV’s camera is always oriented towards the tracked person. Due to this, we can assume that the measurement component is almost always approximately zero .

Inclination Measurement: We assume that the person’s inclination measurement remains close to as observed in Fig. 2. The rationale for this assumption is as follows. Due to our formation controller (as described in subsequent sub-sections) the MAVs track and follow the person while maintaining a desired altitude and distance to her/him. Consequently, the change in is negligible. Hence, we assume .

Substituting these approximate values for , and for (see beginning of this section for the latter substitution) in (III-C1) and using expressions from either [l_mo_bar_shalom_unbiased_converted] or [miller_coordinate_transform], the elements of are obtained in a compact form as,


where are constants involving exponential functions of constants and only.

Iii-C3 Minimizing Fused Uncertainty for MAV Formations

The minimum uncertainty in the person’s fused state estimate is achieved when the following conditions are met.

Fig. 5: Variation of the trace of fused covariance for different angular configurations.

(a) The angles between the MAVs about the person’s estimated position are : Given measurements from observations of different robots (or sensors), [bishop2010optimality] analytically derives optimal sensor geometries for unbiased and efficient passive target localization. Geometries that maximize the determinant of Fisher information matrix are identified. For K() independent measurements, the angle between sensors resulting in minimum fused uncertainty is either or . We validated this analysis by computing from for all possible angular configurations and using the recursive covariance merging equations from [smith1986representation]. Fig. 4 illustrates some of the possible angular configurations. Out of these, Fig. 4 (a), (b) are examples of configurations that do not minimize . The angular configurations which minimize trace of are highlighted in yellow boxes of Fig. 5. Fig. 4 (c) is one of those configurations which minimizes and which we use in this work. It is preferred over other optimal configurations because others do not ensure a uniformly distributed visual coverage of the tracked subject. This is important as our system is motivated by outdoor motion capture of an asymmetric subject (at least in one axis).

(b) The measurement uncertainty for each individual MAV is minimized: The previous angular constraint condition minimizes the fused uncertainty for any given set of measurements from the MAVs. However, the fused covariance remains a function of each individual MAV’s measurement uncertainty, . is a function of the relative position of MAV w.r.t. the tracked person. Notice that each MAV’s position remains controllable without changing the angular configuration of the whole formation. Therefore, the fused uncertainty is completely minimized by minimizing the trace of for each MAV . How this is done is discussed in the following subsection.

Iii-D Decentralized Quadratic Model Predictive Control

The trace of obtained using (2) is as shown below.


where . Minimizing (3) ensures that each MAV improves its measurement uncertainty. However, would cause the MAVs to collide with the tracked person and other MAVs. In order to ensure the safety of the person, we limit the MAV to reach a desired circular safety surface centered at the person with a radius at a desired altitude from her/him. Hence, instead of minimizing (3), which is equivalent to minimizing , we now minimize . The latter expression is in terms of the MAV’s position and a desired position chosen on the aforementioned safety surface and lying in the direction from the MAV to the tracked person. Here, is a identity matrix. We now define the objective of our MPC running on each MAV k as follows. Minimize,

  • the distance to the desired target surface (safety surface) around the tracked person,

  • the difference in velocity between the MAV and the fused velocity estimate of the tracked person and,

  • MAV control effort.

Consequently, the optimization objective of our active tracking MPC is as given below.


where . The first part of this objective minimizes the input control signal and the second part ensures that the distance between desired terminal state (position and velocity) and the final horizon robot state is minimized. In order to ensure continuous and smooth person tracking, we minimize the difference in velocity of the MAV and the fused velocity estimate of the person available from the EKF. is as defined in (3) and is experimentally determined. and are custom-defined diagonal positive semi-definite weight matrices. Furthermore, the active tracking MPC is subjected to the following constraints.

  • A non-convex constraint of maintaining a desired angular difference of w.r.t. other MAVs about the fused estimate of the person’s position .

  • A non-convex constraint to maintain at least a distance from other obstacles.

  • Regional and control saturation bounds for MAV states and control inputs for a horizon.

The non-convex constraints are embedded into the MPC dynamics as repulsive potential field forces [rahult]. The computation of these forces is discussed in Sec. III-E. The convex MPC is thus defined by the following equations.


The 3D translation motion of MAV is governed partly using accelerations and partly by an external control input , where is the horizon time step. The state-space translational dynamics of the robot is given by (6). Dynamics () and control transfer () matrices are given by


where is the sampling time. is a real-time computed external control input representing non-convex constraints of formation configuration and obstacle avoidance. The next section details the computation of these forces.

Note that in our formulation optimality is regarding the minimization of uncertainty in the person’s 3D position jointly estimated by all MAVs. The way we formulate our MPC it cannot optimize the trajectories over the full belief space and hence transient behavior is not explicitly addressed. In our case, however, the MPC prediction time horizons are small enough (15 time steps or 1.5 seconds in total) to neglect the transient effects, as long as the MAVs are not very far away from the tracked person. Moreover, in the MPC we also minimize the difference between target person velocity and the MAV velocities, which further facilitates convergence to the desired view points.

Also, the optimal configuration may not always be feasible, especially, when the obstacles completely block or occlude the desired safety surface around the tracked person. This is inherent to our formulation because we enforce that obstacle avoidance and feasibility of trajectories take precedence (by considering them as hard constraints in MPC) over the optimality of the view points. This is important for the safety of aerial systems.

Iii-E Computation of External Control Input ()

Inter-MAV angular configuration constraints involve inverse-tangent operations with positive or negative arguments, making the constraint non-convex. Moreover, avoiding collisions by maintaining a minimum distance w.r.t. other MAVs and obstacles in the environment, constraint the robot to operate in non-convex regions. In our previous work [rahult], we introduced the concept of converting non-convex constraints into external control inputs using repulsive potential fields. In this work, we use it to enforce formation angular configuration and collision avoidance constraints. The repulsive potential field function used in this work is referred to as cotangential field function (see (17) in [tallamraju2019active]). The magnitude of the cotangential field force is a hyperbolic function denoted as . Argument is either the Euclidean distance or the absolute angular difference between MAV and entity (another MAV or obstacle). Let and be threshold radii, where defines the region of influence of the potential field. is the distance below which the potential field value tends to infinity. Practically, this is clamped to a large positive magnitude which is derived to guarantee obstacle avoidance (see Sec. III-E3).

Iii-E1 Active Tracking Input

To satisfy the inter-MAV angular configuration constraints, each MAV computes its angle about the tracked person in the world frame and the angle of neighboring MAV about the tracked person . The absolute angular difference defined as , is used as the argument to compute the potential field magnitude. We compute for MAV and teammate MAV ,


The factor in (10) helps avoid field local minima. This is because, if two robots have similar angles of approach (i.e., small ), the MAV farther away from the desired distance is repelled with a higher force than the MAV near the desired distance (see [rahult] for a detailed explanation on how these forces avoid field local-minima problems associated with potential field based planners). Factor is a small positive constant which ensures that the force magnitude is non-zero at the target surface if the desired angular difference is not yet achieved. acts in a direction which is normal to the direction of approach to the tracked person . In the plane of approach towards the person, there are two direction choices for a MAV, namely, . The direction pointing away from neighboring MAV ’s position is chosen. This choice ensures a natural deadlock resolution for robots having similar angles of approach to the tracked person. The total active tracking external control input of MAV w.r.t. all other teammate MAVs is


For the active tracking control input .

Iii-E2 Obstacle Avoidance Input

Dynamic Obstacle Avoidance: All the teammates of MAV are considered as dynamic obstacles. The argument for in case of dynamic obstacle avoidance is the euclidean distance between MAV and teammate MAV . It is defined as . Magnitude of is computed using MAV and teammate ’s horizon motion plans. This is enforced along the direction , which is a unit vector pointing in the direction away from the teammate MAV ’s horizon motion plan. The total dynamic obstacle avoidance external input is as follows.


The choice for and for dynamic obstacle avoidance are discussed in the following sub-section III-E3.

Static Obstacle Avoidance: The total external control input due to the static obstacles is as given below.


where . The tracked person is considered as an additional static obstacle for each MAV. An external control input is computed so as to enforce her/his safety. Further, multiple static obstacles placed closed together might cause the MAV to get stuck in a field local minima (as shown in [rahult]). To avoid these scenarios, we compute a force to penalize MAV approach angles () which have static obstacles in the line of approach. The computation of this force is similar to the active tracking force described in the previous section. The total external control input for obstacle avoidance is given by the summation

Subsequently, the total external control input acting on MAV is determined as follows.


Next, we describe the considerations for choosing , and to guarantee obstacle avoidance.

Iii-E3 Obstacle Avoidance Guarantee

To ensure stable external control inputs and guarantee obstacle avoidance, we take into account the following important considerations.

  • maximum tracking error () of low-level controller w.r.t generated way-point for one time step ,

  • maximum magnitude of the MAV velocity ,

  • self-localization uncertainty of the MAV ,

  • localization uncertainty of teammate MAVs ,

  • communication or packet loss with teammate MAVs.

  • maximum cotangential force magnitude .

and corresponding to the external control input for obstacle avoidance are defined as follows.


where, is the maximum eigenvalue of . is the experimentally determined maximum possible eigenvalue of the localization uncertainty of the teammate MAV . Wireless communication delays are unavoidable in real-time implementations and cannot be ignored as they affect the MAVs knowledge of teammate MAV positions. accounts for it by increasing in proportion with the communication delay between the MAVs. , which is the maximum distance a neighboring MAV can travel in seconds.

Iv Experiments and Results

Iv-a Real Robot Experiments

Iv-A1 Setup

We use three self-developed and customized octocopter MAVs in the field experiments. The MAVs are equipped with an Intel i7 CPU, an NVidia Jetson TX1 GPU and Open Pilot Revolution flight controller system on chip. Each MAV runs the CDT algorithm (Fig. III-B) on-board and in real-time, including the deep neural network-based single shot detector (SSD-Multibox). The MAVs use on-board GPS and IMU data, acquired at 100Hz, for localization. We place Emlid Reach differential GPS receiver on each MAV and on either shoulder of the tracked person to acquire ground truth (GT) location. The data from the differential GPS is not used during real-time experiments and stored only for comparison. For a decentralized implementation, a ROS multi-master setup with communication over Wifi is used. Each MAV continuously captures and stores camera images acquired at . The MPC is evaluated at Hz using the CVXGEN convex optimization library. The horizon length is time steps corresponding to seconds. The state and velocity limits of each MAV is in and in . The control limits are defined as . The desired horizontal distance to the tracked person in all experiments is set as and the desired height as . The tracking error w.r.t. to the MPC generated way-point is , therefore we use m (see subection III-E3). w.r.t. teammate MAVs varies with the change in self localization uncertainty and communication losses. The task for each MAV in all experiments is to track and follow the person using the proposed approach. We conducted real-MAV experiments (RE), each with a -MAV formation as follows.

  • RE 1: Approach of Price et al. [eprice] (duration 257s)

  • RE 2: DQMPC (Tallamraju et al.) [rahult] (231s)

  • RE 3: Our approach (269s)

  • RE 4: Our approach with one MAV avoiding emulated virtual obstacles (278s)

  • RE 5: Our approach with all MAVs avoiding emulated virtual obstacles (106s)

Fig. 6: Multi-exposure images of short sequences from (RE 1 to RE 4). (a) showcases the results based on our previous work [eprice]. Notice that the MAVs are close to the target person and never uniformly spread around the person’s position. (b) shows the results based on our previous work [rahult]. Notice that the MAVs and are quite close to each other and the resulting formation is non-optimal for uncertainty minimization. In (c), (d) and the image in Fig. 1 the results of the approach presented in the current paper is showcased. These correspond to experiments RE 3 to RE 5, respectively. Notice that the MAVs are almost uniformly spread around the person’s position and maintain an angular configuration with a difference of approx.  w.r.t. each other. Moreover, the MAVs successfully maintain the desired safe distance and altitude from the person. This configuration ensured minimization of fused uncertainty of the person’s position in RE 3RE 5. The experimental footage is provided in the accompanying video.
Fig. 7: Comparison of different methods in real world experiments: (a) accuracy of the tracked person and MAV self-pose estimates; (b) uncertainty in the tracked person estimate; (c) fraction of total experiment duration (jointly for all MAVs) the tracked person was completely in the camera image frame; (d) distribution of distance of tracked person from image center. (e) Real world experiment RE 3: Tracked person’s estimated (blue) and ground truth (green) trajectory comparison of our active perception approach (without simulated stationary obstacles). Ground truth remains within the 3 error bound of tracked estimate throughout the experiment duration except for a few initial seconds.

In each experiment, the person remains stationary at first. Data collection starts after all MAV’s have acquired the visual line of sight to the person and converged on a stable formation around him/her. The person then walks randomly at moderate speeds, runs or performs standard exercise movements (see attached video or here222 for details). In the end, MAVs are manually landed. The error in the tracked position estimate of the person is calculated as the 3D Euclidean distance between the estimated and corresponding GT provided by the differential GPS system. Error for MAV pose is calculated similarly.

Iv-A2 Analysis of results

Images in Fig. 6 and Fig. 1 show short sequences from RE 1RE 5. Fig. 7(e) compares the corresponding GT with the tracked person’s position estimate obtained by one of the MAVs using our proposed approach in RE 3. We analyzed the estimate’s accuracy, uncertainty, and the MAV self-pose accuracy for all experiments in Fig. 7(a-b). RE 3 achieves significantly more accurate person track estimates (mean error of m) than using state of the art method in RE 1 (mean error of m), despite the worse self pose estimates of the MAVs in RE 3 (the self-localization errors were due to the high errors in the MAV’s GPS localization and may vary arbitrarily over different experiments). This showcases the ability of our approach to attain high tracking accuracy despite bad GPS reception.

Even though the state of the art approach in RE 2 achieves similar accuracy to that of our approach in RE 3, we see that our approach outperforms the former in keeping the person always in the camera image and close to the image center as shown by bar and box plots in Fig. 7(c-d). Moreover, our approach achieves least tracking uncertainty in the tracked person estimate as compared to the other two state-of-the-art methods. The presence of obstacles that the MAVs need to navigate around affects the target tracking and self pose accuracy in RE 4 and RE 5. This is mainly due to the additional maneuvering overhead to avoid obstacles. However, the ability to keep the person close to the center of the image is only slightly affected. Fig 7(c-d) compares the different approaches ability to keep the tracked person centered in each MAV’s camera view. Our approach in RE 3 not only reduces the average distance between the tracked person and each camera’s image center but also ensures the person is completely covered by the camera image in of camera frames. This is a crucial feature for a system designed for aerial motion capture.

Iv-B Simulation Experiments

Iv-B1 Setup

The proposed algorithm was simulated in the Gazebo. The simulations were conducted on a standalone Intel i7-3970X CPU with an NVIDIA GTX 1080 GPU. We used AscTec Hexarotor Firefly MAVs in a world of . Each MAV has a rigidly attached Asus Xtion camera with their parameters set to the real MAV camera parameters. We simulate GPS and IMU drift by imposing random-walk offset on the ground truth position. We simulate a human actor model and guide it along a pre-defined trajectory. The actor traverses on a randomly varying terrain interspersed with emulated trees. The trees create obstacles and occlusions for the hovering MAVs.

Iv-B2 Comparison of methods

Fig. 8: Statistics of Simulation Experiment: (a)–(d) please refer caption of Fig. 7 for description of these plots; (e) robustness of our approach with increasing static obstacles.

For each of the following methods, we conducted 30 simulation runs with a MAV formation: (i) SE 1: Approach of Price et al. [eprice], (ii) SE 2: DQMPC (Tallamraju et al.) [rahult], (iii) SE 3: Our approach, and (iv) SE 4. Our approach with all MAVs avoiding emulated virtual obstacles. First 10 runs were of 120 seconds each, and next 20 runs were 180 seconds each. Figure 8 shows the statistics of this set of experiments. We clearly observe that our proposed approach with or without static obstacle avoidance outperforms both the state of the art approaches. Especially, when static obstacles are not present, our approach is able to keep the person fully in the camera image frame almost of the experiment duration. Also, it significantly outperforms all other methods in minimizing the joint uncertainty in the target position estimate.

Iv-B3 Scalability Experiments

Fig. 9: Our method’s scalability and robustness.

Fig. 9(a) shows the result of target tracking with our method for an increasing number of robots. We conduct 10 simulation trials for each number of robot configuration. Real-time factor for all runs in this experiment was set to . We validate the scalability of our model predictive controller by running these experiments in an environment with several static and known obstacles (in the form of trees) and perfect communication for a formation of 3 to 16 MAVs. We observe nearly linear improvement in tracking error with a higher number of robots. At the same time, we notice that computational requirements did not affect real-time performance for up to 16 robots.

Iv-B4 Experiments with Communication Failure

In Fig. 9(b), the effect of inter-robot communication failure on tracking is demonstrated for our approach through experiments with 8 MAVs and simulated static obstacles. Communication losses varying from to is simulated. The results were averaged over trials per communication loss percentage. It can be observed that the tracking gets progressively worse with a higher percentage of communication loss. At communication loss, each robot relies only on its detection and does not cooperatively improve the target position estimate. Nevertheless, we observe that our approach is able to maintain an accurate target state estimate for up to 25% communication loss.

Iv-B5 Experiments with increasing number of obstacles

Fig. 8(e) shows the results of our approach with increasing number of environmental obstacles. For each map, we conducted 5 trials (180 seconds each) with 5 MAVs tracking a randomly walking human model. We observe that by increasing the number of obstacles, the tracking error only slightly deteriorates. Moreover, the error in tracking is close to that of the environment with no static obstacles (see Fig. 8(a)). This result further validates our approach as the robots attain the desired configuration around the target person by navigating around randomly placed environmental obstacles.

V Conclusions and Future Work

We proposed a decentralized convex MPC-based algorithm for the MAVs to actively track and follow a moving person in outdoor environments and in the presence of static and dynamic obstacles. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We showed how we embed all the non-convex constraints, including those for dynamic and static obstacle avoidance, as external control inputs in the MPC dynamics. We evaluated our approach through rigorous and multiple real robot field experiments in an outdoor scenario. These experiments validate our approach and show that it significantly improves accuracy over the previous methods. In simulations, we showed results with up to 16 MAVs. These demonstrate the scalability of our method as well as its robustness to communication failures. Future work includes addressing visibility issues, e.g., occlusions of the person and assymetry of the target, which are not explicitly solved by our current approach. We also intend to extend our method to full body human pose detection.


The authors would like to thank Igor Martinovic and Mason Landry for their help with the field experiments. For the published RA-L manuscript and hence this revision on Arxiv, we thank the anonymous reviewers, associate editor and the editor Eric Marchand for their highly constructive feedback.


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description