RF Backscatter-based State Estimation for Micro Aerial Vehicles

# RF Backscatter-based State Estimation for Micro Aerial Vehicles

## Abstract

The advances in compact and agile micro aerial vehicles (MAVs) have shown great potential in replacing human for labor-intensive or dangerous indoor investigation, such as warehouse management and fire rescue. However, the design of a state estimation system that enables autonomous flight in such dim or smoky environments presents a conundrum: conventional GPS or computer vision based solutions only work in outdoors or well-lighted texture-rich environments. This paper takes the first step to overcome this hurdle by proposing Marvel, a lightweight RF backscatter-based state estimation system for MAVs in indoors. Marvel is nonintrusive to commercial MAVs by attaching backscatter tags to their landing gears without internal hardware modifications, and works in a plug-and-play fashion that does not require any infrastructure deployment, pre-trained signatures, or even without knowing the controller’s location. The enabling techniques are a new backscatter-based pose sensing module and a novel backscatter-inertial super-accuracy state estimation algorithm. We demonstrate our design by programming a commercial-off-the-shelf MAV to autonomously fly in different trajectories. The results show that Marvel supports navigation within a range of m or through three concrete walls, with an accuracy of cm for localization and for orientation estimation, outperforming commercial GPS-based approaches in outdoors.

LoRa backscatter, micro aerial vehicle, navigation, state estimation

## I Introduction

Over the last decade, the rapid proliferation of micro aerial vehicles (MAV) technologies has shown great potential in replacing human for labor-intensive or even dangerous indoor investigation and search, such as warehouse inventory management and fire rescue [19, 20, 39, 14, 3]. Specifically, using MAVs to manage inventory for warehouses cuts inventory checks from one month down to a single day [26], and using MAVs for search and rescue in firefighting operations saves the lives of firefighters by the fact that of deaths of the firefighters in the United States occurred in burning buildings in 2017 [28]. These applications require MAVs navigating autonomously in dim warehouses [7] or smoky buildings while reporting to a server or controller at a distance or through walls.

State estimation is fundamental to the autonomous navigation of MAVs. The state, including position, velocity, and orientation, is the key to the flight control system of an aerial vehicle that adjusts the rotating speed of rotors to achieve desired actions for responding remote control or autonomous operations. The mainstream uses GPS, compass and vision sensors to estimate a MAV’s state. However, GPS-compass based approaches [6, 2] only work in outdoor free space since GPS signals can be blocked by occlusions and compass measurements are easily distorted by surrounding environments. In indoors, computer vision (CV) based approaches have attracted much attentions due to their lightweight, high accuracy, and low cost, while limited to good lighting or texture-rich environments [41, 4, 27, 22], thereby failing to work in dim warehouses or smoky fire buildings.

Recent years have witnessed much progress in using RF signals to track a target’s pose (position and orientation), holding the potential to state estimation that is highly resilient to visual limitations. Despite novel systems that have led to high accuracy [15, 40, 17, 9, 35, 38, 36, 29], hardly any of these ideas have made it into the scenario of indoor MAVs that needs the following requirements:

• Long range/through wall. To scan items across a warehouse or navigate in a fire building, the system should support the navigation at least across rooms or over an area of tens of meters.

• Lightweight. As a MAV is typically compact with limited battery capacity, it requires a lightweight, small-sized, and low-power sensing modality to enable state estimation.

• Plug-and-play. To make it practical to emergency rescue and efficient indoor investigation, the system should be instantly operational in an unknown environment without prior infrastructure deployment, pre-training process, or labor-intensive setup.

Specifically, RFID-based solutions [17, 9, 35, 38, 29] are lightweight while their operational range is limited. UWB and WiFi based solutions [15, 40, 36] have better operational range while requiring pre-deploying multiple anchors to enable the localizability, failing to meet the plug-and-play requirement.

In this paper, we present Marvel, a state estimation system for MAVs that satisfies all these requirements. As shown in Fig. 1, Marvel is able to support the navigation in burning buildings with smoke and fog. It allows a MAV to safely fly across rooms or in an area of several tens of meters. The system is lightweight in that it merely attaches a few backscatter tags to the landing gear of a commercial MAV without any internal hardware modification. Marvel works in a plug-and-play fashion that only requires the MAV’s single controller, whose location is unknown, to interact with the MAV and the onboard backscatter tags. The design of Marvel is structured around two components:

(a) Backscatter-based pose sensing: Marvel’s first component enables a backscatter-based sensing modality that allows it to estimate the response of the attached tags over backscattered signals that are drowned by noise. This sensing modality leverages chirp spread spectrum (CSS) signals to enable the pose tracking in long range or through occlusions where the signal amplitudes and phases are not available. It introduces a set of algorithms that first estimate channel phases of the tags’ backscatter signals under mobility and then use the phases to estimate pose features, including the range, angle and rotation of the MAV to its controller. This component enables Marvel to operate across rooms or in an area of several tens of meters.

(b) Backscatter-inertial super-accuracy state estimation: The estimates of the first component cannot fulfil the state estimation, i.e., computing a MAV’s position, velocity, and orientation. Marvel’s second innovation is a backscatter-inertial super-accuracy algorithm that combines the backscatter-based estimates with the onboard IMU measurements to enable accurate state estimation. Despite that IMU suffers in error accumulation, our backscatter sensing is drift-free, i.e., no temporal error accumulation, being able to correct the IMU drift by multi-sensor fusion. It employs a graph-based optimization framework to compute a Gaussian approximation of the posterior over the MAV trajectory. This involves computing the mean of this Gaussian as the configuration of the state that maximizes the likelihood of the sensor observations.

Results. We build a prototype of Marvel using a DJI M100 MAV attached with four LoRa backscatters customized by off-the-shelf components. We demonstrate Marvel’s potential by programming the MAV to autonomously fly in different trajectories in a long-range open space and an across-room indoor test site. The results show that Marvel can support the navigation within a range of m or with three concrete walls blocked the vehicle to its controller and achieves an average accuracy of cm for localization and for orientation estimation, outperforming commercial GPS-based navigation approaches [6, 2].

Contributions. We introduce a novel backscatter-based pose sensing technique that first extracts the channel phases of CSS signals under mobility and then estimates pose features of a MAV by the phases. This enables pose tracking in long range or through occlusions. In addition, we design a backscatter-inertial super-accuracy algorithm that fuses backscatter-based estimates and IMU measurements to enable accurate state estimation. Finally, we implement a prototype and conduct real-world experiments demonstrating the system’s ability to navigate a MAV across rooms or in an area of several tens of meters.

## Ii System Overview

Marvel is a system that enables state estimation to support a MAV’s navigation over an open area of tens of meters or across rooms in indoors. It works with the MAV’s controller and four LoRa backscatter tags attached on the landing gear. As shown in Fig. 2, the controller consists of a data handler and a backscatter signal handler. The data handler sends and receives data to and from the MAV by one antenna. It excites the tags and exchanges data like channel phases and velocities via CSS signals of LoRa. The backscatter signal handler receives the backscattered signals by a linear array of three antennas. It extracts the channel phases and then sends them to the MAV. The computer on the MAV takes the phases to compute the state. It then sends the state to the flight control system that computes commands to adjust actions and navigates the MAV to destination.

Marvel consists of two components:

• Backscatter-based pose sensing module contains chirp-based estimation approaches that first estimate channel phases and then use the phases to compute the range, angle and rotation of the MAV to its controller, enabling pose tracking in long range or through occlusions.

• Backscatter-inertial super-accuracy state estimation algorithm fuses the estimates of the backscatter-based sensing with the IMU measurements, which include 3D accelerations and angular velocities, through a graph-based optimization framework. It models each module’s estimates as a Gaussian mixture and computes a Gaussian approximation of the posterior over the MAV trajectory.

## Iii Backscatter-based Pose Sensing

### Iii-a Primer on CSS Processing

Before proceeding to our design, we first briefly introduce the CSS signals of LoRa that we use. The controller transmits a linear upchirp signal with a bandwidth to the MAV with backscatter tags. A tag backscatters the signal with a frequency shift for preventing the interference from the excitation signal. Multiple tags perform different frequency shifts. Marvel uses the chirp signal that is compatible with LoRa protocol. Its duration depends on spreading factor () and bandwidth [13], i.e., , where and the maximum is KHz.

To decode the chirp, the receiver first multiples the received signal with a synthesized downchirp whose frequency linearly varies from to . Then, it takes a fast Fourier transform (FFT) on this multiplication (Fig. 2). This operation sums the energy across all the frequencies of the chirp, producing a peak at an FTT bin.

Next, we describe how the controller extracts the channel phase of the received chirp from the peak. Then, we elaborate on how Marvel leverages the phase to estimate its position and rotation.

### Iii-B Below-Noise Channel Phase Extraction

Since MAVs are expect to carry out emergency tasks like fire rescue, the system desires the localizability with a single anchor (its controller) and without prior knowledge of the work space, being instantly deployable and operable wherever required. The position of a target referring to a single anchor can be represented by the angle and the range of the target to the anchor as polar coordinates. And both the parameters can be inferred by the channel phase of the signal.

The channel phase extraction for chirp signals has two steps as shown in Fig. 3: we first combat the Doppler effect to estimate the beginning of the chirp and then we extract the channel phase leveraging the linearity of the chirp frequencies.

The channel phase extraction for chirp signals has two steps: we first combat the Doppler effect to estimate the beginning of the chirp and then we extract the channel phase leveraging the linearity of the chirp frequencies.

To estimate the beginning of the chirp, we leverage a key property of the chirp signal: a time delay in the chirp signal translates to frequency shift. Ideally, decoding the original upchirp with a downchirp produces a peak in the first FFT bin (see Fig. 2). When a tag is separated from the controller, the backscatter signal handler receives the signal with a timing offset of the signal’s round trip. The peak appears in the shifted bin . If we move the beginning of the received chirp samples closer to its real beginning and repeat the decoding operation, there will be a new peak at the first FFT bin again and the symbol at this instant is the beginning of the transmission. However, under the MAV’s mobility, the signal additionally experiences the Doppler frequency shift. The shifted bin is a mixed result of the timing offset and the Doppler effect. The above operation can no longer recover the beginning of the chirp.

Our solution leverages the kinetics and the structure of a MAV. We attach four backscatters on the landing gear of a MAV. As shown in Fig. 4, the Doppler frequency shift of a tag, e.g., tag , is a combinatorial result of translation and rotation. The shift can be expressed as,

 Δf(t)=fccup(t)⋅[vt(t)+vr(t)]=Δft(t)+Δfr(t), (1)

where is the unit vector that represents the direction from the MAV to the controller, the carrier frequency, the speed of RF signals in the medium. and are the translational velocity and the rotational velocity. and corresponds to the translational shift and the rotational shift. To estimate the beginning of the chirp, we need to isolate the frequency shift translated from the timing offset by eliminating the effect of Doppler shift.

Eliminating the effect of Doppler shift. We first eliminate the effect of the rotational shift by the key observation that any pair of opposing tags on the landing gear, e.g., tags and in Fig. 4, always have rotational velocities with the same magnitude but opposite directions and all tags share the same translational velocity. Thus, averaging the shifted peak of two opposing tags eliminates the rotational shift as shown in Fig. 5. Specifically, decoding the backscattered signals from a pair of opposing tags, we obtain the FFT bin indices, and ,

 ^Bi=fiT+Δfit+Δfir,^B′i=fi′T+Δfit−Δfir, (2)

where and are the frequency shift translated by the timing offset, and the translation shift and the rotational shift of tag . Note that since their maximal difference is the translated shift from the traveling time of the distance between a pair of opposing tags, i.e., the diameter of the MAV, which is negligible as ( cm for the DJI M100) is too small for the speed of RF signal propagation. Thus, averaging them, i.e., , eliminates the rotational shift.

Since the two pairs of tags on the MAV are structurally symmetric, when we perform the above operation to each pair, the results are expected to be identical. However, they exhibit a slight difference as shown in Fig. 5. This is because the micro-controllers of the tags are not synchronized with the controller, it introduces an additional carrier frequency offset (CFO) for each tag, which is a constant. In our approach, is the difference of CFOs upon averaging the two pairs of tags, which is still a constant. We can simply apply this to the rest of the transmission to estimate the right chirp phase.

Now we eliminate the translational shift to isolate the frequency shift translated from the timing offset. Then, we can obtain the signal at the real beginning of the transmission by moving the beginning of the received chirp samples. can be tracked using the accelerations measured by the onboard IMU. Initially, the MAV is about to take off. At this initial stage, there is no motion, is already the frequency shift . Thus, the channel phase can be obtained according to the workflow (Fig. 3). Then, we specify in Eqn. (1) by our angle estimation algorithm in § III-C. When the MAV takes off, the accelerations measured by IMU can track the translational velocity . Thus, and . Note that integrating the accelerations to obtain the velocity will suffer from the temporal drift. The super-accuracy algorithm in § IV corrects the drift and feeds back to the controller.

Extracting channel phase. At this stage, we have corrected the signal to the symbol at the beginning of the transmission. Now we compute the channel phases of all frequencies in the chirp. Taking an FFT to the multiplication of the corrected upchirp and the downchirp adds the phase across all the frequencies in the chirp. Assume that a linear chirp has frequencies and the channel path of the signal remains constant throughout the duration of the chirp, then the phase of the received chirp changes linearly with the frequency. We have

 ^θΣ=θ1+θ2+⋯+θN=θ1+θ1f2f1+⋯+θ1fNf1, (3)

where are explicitly defined when generating the chirp signal. Solving the above equation obtains the channel phases of all frequencies in the chirp.

Note that this method requires a short chirp duration to be within the channel coherent time. Meanwhile, the signal needs good decoding capability, which is proportional to the product of signal duration and bandwidth. To this end, we choose the parameters of CSS signals that conform to LoRa standard as , KHz, and thus the chirp duration is ms.

### Iii-C Below-Noise Pose Sensing

So far we have obtained the channel phases of backscatter signals. We now use them to estimate the pose feature, including the range and angle of a MAV to its controller for positioning, and the MAV’s rotation for determining the orientation.

Range estimation. Assume that the controller is separated from a tag on the MAV by a distance of . A linear chirp signal with frequencies transmitted by the controller propagates a total distance of for the round trip to and from the tag. The wireless channel of such a signal is,

 H=[γ1e−j2πf12rc,γ2e−j2πf22rc,⋯,γNe−j2πfN2rc], (4)

where is the attenuation corresponding to frequency in the chirp, . In the absence of multipath, we can use the obtained channel phases of the backscatter signal to estimate the range . However, due to multipath, the obtained phases is actually the sum of phases of the direct-path signal and the multipath-reflected signals.

To combat multipath while conforming to LoRa protocol, we send multiple chirps in the channels of MHz band and combine the phase information across all these channels to simulate a wideband transmission. At a high level, a wideband signal can be used to disambiguate the multipath. There are channels separated by MHz with respect to the adjacent channels. We have four tags on the MAV which are configured to different frequency shifts for preventing the interference from the excitation signal. So, the controller can transmit excitation signals in channels and receive backscatter signals across channels. By combining them, the controller sends the phases at all the channels to the MAV through LoRa. Then, the MAV computes the range estimate by using an inverse FFT on the phases to get the time-domain multipath profile. We use a fixed energy threshold over this profile to identify the closest (most direct) path from the MAV.

Angle estimation. The angle of incident signals is also encoded in the phases of the signals. The backscattered chirp signal received by a linear array with antennas from propagation paths has the measurement matrix ,

 X=[x1…xN]=S[F1…FN], (5) SFi=[s(ϕ1)…s(ϕK)][γi1…γiK]⊤,i={1,⋯,N}, s(ϕk)=[1e−jηsin(ϕk)…e−j(M−1)ηsin(ϕk)]⊤,

where , denotes the attenuation factors of paths at frequency in the chirp, the attenuation factor of path at frequency . is the steering matrix where denotes the steering vector of path , and the constant where is the antenna spacing. is the angle of interest. We can see that the angle only exists in the steering matrix, contributing the phases in the complex elements of matrix .

Thus, even without the attenuation information, we can use the obtained phases to construct a virtual measurement matrix of which all complex elements have unit attenuation with the phases of frequencies in the chirp to allow the angle estimation. The virtual measurement matrix can be written as

 ^X=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣ejθ11ejθ12⋯ejθ1Nejθ21ejθ22⋯ejθ2N⋮⋮⋱⋮ejθM1ejθM2⋯ejθMN⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, (6)

where denotes the phase of antenna at frequency . Applying to the super-resolution angle estimation technique [11], we obtain the direct-path angle of a tag to the controller. The four tags provide four angles for every chirp. We compute the harmonic mean of the four angles as the final result.

By far, we model the angle estimation in 2D case (Eqn. (5)) for ease of presentation. It is trivial to be extended to 3D case. Specifically, the 3D angle is represented by two parameters, azimuth angle and elevation angle . It can be expressed by a vector . Since the linear array is only capable of determining one parameter, we change it to a 2D circular array with the three antennas. This will slightly change the steering vector in Eqn. (5). Then, we can jointly search the two parameters in a very efficient way [37] to estimate the 3D angle. Or we can save the two-dimensional search using a barometer, which is a common sensor equipped on MAVs, to measure the MAV height . With the range between the MAV and its controller, the elevation angle is fixed by . Then, the angle estimation in 3D reduces to the problem in 2D case described above.

Rotation estimation. The real problem to determine a MAV’s orientation is how to anchor the yaw, a.k.a., heading. The orientation can be represented by Euler angles: roll , pitch , and yaw for a rotation around , , and axes (Fig. 4). And it can be computed by integrating the 3D angular velocity readings from the onboard IMU. The results however suffer from temporal drifts due to the inherent noise of IMU. Thanks to IMU that it drifts in four degrees of freedom, which includes 3D position and yaw angle. When equipping an IMU on an aerial vehicle, the yaw typically indicates the heading of the vehicle. Therefore, we need drift-free rotation estimates to fix the heading.

Our idea is that the rotational shift is solely determined by the rotation. We can use it to map the rotation. According to Eqn. (2), subtracting the indices of the peaks from two opposing tags and gives the rotational frequency shift,

 Δ^Bi=^Bi−^B′i=fiT−fi′T+2×Δfir≈2×Δfir. (7)

Now we model the rotational shift. We denote the angle of the MAV to its controller as and the MAV’s rotation as (refer to Fig. 4), then , where is the angular velocity during the rotation. The rotational shift can be expressed as

 Δfir=fccup⋅vr=fcD2cω×sin(ϕ−ψ). (8)

The controller computes by Eqn. (7) and sends it to the MAV. can be obtained by the angle estimation algorithm. The gyroscope in IMU measures angular velocity . The rest parameters are known constants. Thus, rotation can be solved by Eqn. (8). From this model, we can see that when , the frequency shift is . This is when the direction of the rotational velocity is orthogonal to the direction of the MAV to the controller. In this case, it poses an ambiguity of the rotation that could be or . It is trivial to eliminate this ambiguity by using the measurements from the other pair of tags on the landing gear, which is an orthogonal structure.

## Iv Backscatter-inertial Super-accuracy State Estimation

So far, we have discussed how Marvel measures the pose feature, including the range, angle and rotation of a MAV to its controller, based on backscattered CSS signals. In this section, we enable accurate state estimation by taking advantage of the onboard IMU to jointly optimize the state with the backscatter-based pose sensing.

Since the controller’s location is unknown, solving the state estimation problem consists of estimating the MAV state over its trajectory and the controller’s location. The controller is essentially a key feature to the map of the environment in which the MAV moves. This falls into the simultaneous localization and mapping (SLAM) problem domain. Solutions to the SLAM problem can be either filtering-based or graph-based approaches. Filtering-based approaches are considered to be more efficient in computation as they only estimate the current robot state and the map. Their main drawback is that fixing the linearization points early may lead to suboptimal results. On the contrary, graph-based approaches can achieve better performance via repetitively linearizing past robot states and multi-view constraints [14, 16]. We employ a graph-based optimization framework to solve our state estimation problem.

### Iv-a Problem Formulation

The graph representation of our state estimation problem is shown in Fig. 6. Let denote the state at time . At each , the MAV observes a set of backscatter sensing measurements which include range , angle and yaw rotation . Typically, the IMU data rate ( Hz) is much higher than the data rate of the backscatter sensing ( Hz). Thus, there have been buffered multiple IMU measurements between two states. is an integrated result over these measurements (specified in § IV-B) that represents the odometry between two consecutive states, e.g., and .

To achieve real-time processing, we employ an incremental state update scheme [10] that takes IMU and backscatter-based measurements in a fixed time interval for state estimation. As long as a new state with its backscatter-based measurements is available, our approach works in a sliding window fashion that incorporates the new state and removes the oldest state. The full state vector within the interval is defined as,

 S =[s0,s1,⋯,sn,ρ] (9) sk =[p0k,v0k,q0k],k∈[1,n],

where denotes thstate in the window, which contains position , velocity , and rotation with respect to the first (th) state. is the Hamilton quaternion [32] representation of the rotation. We use the quaternion representation for modelling the odometry as a vector. is the number of states in the sliding window. denotes the position of the controller.

Based on the state vector, we minimize the Mahalanobis norm of all measurement residuals to obtain a maximum a posteriori estimation:

 (10)

where and are measurement residuals for LoRa backscatter and IMU, respectively. is the set of backscatter-based pose features and denotes the set of IMU measurements. We choose the Mahalanobis norm to be the optimization objective because it takes into account the correlations of the data set. These correlations amongst internal states of different sensing modalities are key for any high-precision inertial-based autonomous system [12].

### Iv-B Backscatter-inertial State Estimation

We now solve the nonlinear system (10) for state estimation via the Gauss-Newton algorithm. This involves linearizing the nonlinear system by the first order Taylor expansion of the residuals in (10) around a reasonable initial guess. Since the backscatter-based pose sensing provides ranges, it is easy to obtain an initial guess of the state. The objective is to minimize the sum of the Mahalanobis norm of backscatter sensing and IMU residuals. The Mahalanobis norm in (10) can be explicitly expressed as,

 ∥∥eL(^zj,S)∥∥2Pj =ejL⊤(Pj)\raisebox4.945pt$−1$ejL (11) ∥∥eI(^ukk+1,S)∥∥2Pkk+1 =ekI⊤(Pkk+1)\raisebox4.945pt$−1$ekI.

Here we briefly denote and as and . Next, we define the residuals and their corresponding covariance matrices.

Backscatter sensing residual. Our backscatter-based pose sensing gives range , angle , and yaw rotation . Rotation can be trivially derived by with drift-free roll and pitch provided by IMU. Given these measurements, the residual is defined as,

 (12)

where extracts the vector part of the quaternion, which is the approximation of the error-state representation. is the 3D error-state representation of quaternion. The covariance matrix is the measurement noise matrix, which can be estimated by statistically analyzing the pose features.

IMU residual. Based on the kinematics, the residual of IMU measurements can be defined as,

 (13)

where is the preintegrated result [18] using accelerations and angular velocities , which are raw measurements provided by IMU at time within the time interval between two consecutive states. Specifically, , , where denotes the quaternion multiplication operation. the conversion from the quaternion to the rotation matrix. The covariance can be computed recursively by first-order discrete-time propagation within , referring to [27] for more details.

At this stage, all the terms of the nonlinear system (10) have been explicitly defined. We then use Ceres Solver [1] to solve the nonlinear problem.

## V Implementation and Evaluation

### V-a Implementation and Evaluation Methodology

The controller is built by two colocated NI USRP-2943 nodes, each with a UBX160 daughterboard. They have four channels to be configured as a data handler with one antenna and a backscatter signal handler with three antennas. We configure USRP to work on MHz band and all signal parameters conform to LoRa standard. The three antennas for the backscatter signal handler are mounted to an acrylic pole separated by a distance of cm. The USRP nodes are synchronized using an external clock and frequency reference. We run the CSS decoding and the channel phase extraction on the controller.

The MAV system is built by attaching an Intel NUC, a LORD MicroStrain 3DM-GX4-45 IMU, and an SX1276MB1LAS long-range transceiver on the DJI Matrice 100. In addition, there are four customized LoRa backscatter tags attached on the landing gear of the MAV. The backscatter uses the ADG919 and ADG904 RF switches to enable backscatter communications. The four backscatters are controlled by an Altera STEP-MAX10 FPGA. It configured them to shift MHz frequency with each other when backscattering the linear chirps with KHz bandwidth. We run Marvel on the Intel NUC with a 1.3 GHz Core i5 processor with cores, an GB RAM and a GB SSD, running Ubuntu Linux. The backscatter-based pose sensing module and the backscatter-inertial super-accuracy state estimation algorithm are written in C++. We use Robot Operating System (ROS) to be the interfacing robotics middleware. The experimental platform is shown in Figure 7.

We conduct experiments in both outdoors and indoors for the evaluations in long-range and through-wall settings. The outdoor experiments are conducted in an open field in front of an office building. There is no obstacle between the MAV and the controller. The indoor experiments are conducted in a MAV test site of square meters. The site is located on the basement level of an office building as shown in Fig. 9. Multiple rooms are separated by concrete walls, drywall, and wooden doors and have office furniture including tables, chairs, and computers. In the following, we first examine Marvel’s sub-module accuracy and latency in micro-benchmark evaluation, and then we evaluate its system-level performance under different motions and environments.

### V-B Micro-benchmark Evaluation

We evaluate the performance of positioning and rotation estimation, respectively. To evaluate the positioning approach, we build a sliding rail by the stepper motor ROB-09238 [30] that supports the moving with a controllable speed. We place the MAV on a plate mounted on this rail. To evaluate the rotation estimation, we place the MAV on a plate mounted on the stepper motor and control the rotating speed. In long-range experiments, we place the controller at one end of the field and move the MAV away from the controller in increments of m. In through-wall experiments, we place the MAV in the test site and move the controller to different rooms (Fig. 9). There are three concrete walls between the controller and the MAV at location . At each location, we repeat the experiment multiple times and compute the errors.

Positioning accuracy. We first validate the positioning capability of Marvel in different speeds. We compare Marvel with the state-of-the-art CSS-based localization system, locate [23], which operates correctly in semi-stationary scenarios. As shown in Fig. 9, the accuracies of the two approaches are similar in stationary case, whose mean error is around m. However, the error of locate scales with the speed since its channel phase estimates are distorted by the Doppler frequency shift. Its position error reaches m in the worse case while Marvel’s accuracy keeps steady.

The positioning results in different settings are shown in Fig. 11. To demonstrate that our approach is resilient to the Doppler effect under mobility, we move the MAV along the rail in a speed of m/s, which is the maximum speed allowed.

The long-range result shows that the error scales with the MAV-controller distance. Specifically, the position error of m at a distance of m, which increases to m at a distance of m. This further increases to m at a distance of m. This is due to the fact that the angle estimate with limited accuracy maps to a growing uncertain area of the MAV’s position with the increasing distance. Our customized backscatter works at most m at which the worst case position accuracy is m. Beyond that distance, the power of the received signal is too low to decode even with the CSS coding.

The through-wall result shows that the accuracies at different locations are similar because the MAV-controller distance does not vary much. But the accuracy in indoors is worse than at a distance of m in the open space due to the multipath fading. The worst case accuracy at location where has three walls blocking the MAV and the controller is m. Our controller is unable to receive the backscatter signal when it goes through more than three walls.

In summary, the position accuracy is limited to meter level in both outdoors and indoors due to the limited signal bandwidth at the MHz band that we use. Nevertheless, with the aid of IMU, Marvel achieves decimeter-level accuracy as shown in § V-C.

Rotation estimation accuracy. We evaluate the rotation estimation by controlling the stepper motor whose angular velocity starts from rad/s and increases by the rate rad/s until rad/s, and then decreases by the same rate to be back at rad/s. The whole process takes seconds as shown in Fig. 11. We repeat the experiment times and analyze the data. As expected, the result in the through-wall setting is worse (mean error , standard deviation ) than the other (mean error , standard deviation ) due to the larger error of angle estimation in the presence of multipath. Fig. 11 also shows that our rotation estimation algorithm succeeds in closely tracking the MAV’s rotation with varying angular velocities in both settings, providing drift-free results.

### V-C System-level State Estimation

We program the MAV to fly in different trajectories for evaluating the overall performance of Marvel in various motional patterns. The ground truth of the flight trajectories is provided by OptiTrack [24]. The maximum linear velocity reaches m/s in this experiment. We skip the plot of linear velocity estimation as it is the first derivative of position and thus it is highly related to the position accuracy.

In long-range experiments, the MAV flied in a circular trajectory. Since the backscattered signal cannot be decoded when the distance is longer than m, the controller is placed m away from the MAV before taking off to ensure that the MAV cannot go beyond the distance limitation during the flight. As shown in Fig. 12, the average error of state estimation is cm for positioning and for orientation estimation. This demonstrates that the super-accuracy algorithm significantly increases the accuracy of pose tracking, enabling accurate state estimation.

In through-wall experiments, for safety reasons, the MAV has to fly in the test site. We placed the controller at location and the MAV flied in a square trajectory due to the limited area. As shown in Fig. 13, the average position error over the trajectory is cm and the average orientation error is . The accuracy is slightly worse than in the open field due to the multipath fading and the more aggressive motions around the corners of the square trajectory.

Overall, the performance is better than GPS-based state estimators [6, 2] (meter-level accuracy), which are only applicable in outdoors. In addition, the position accuracy is also better than the state-of-the-art WiFi indoor localization systems [11, 37, 34], which have to use an active WiFi radio and cannot work in long-range or through-wall settings. Note that Marvel is not designed to defeat CV-based approaches [27, 14, 41] in accuracy. They are more accurate when environments are well-lighted and texture-rich. But Marvel complements them to support MAV navigation in vision-crippled scenarios.

## Vi Related Work

State estimation for aerial vehicles has been a long studied problem in the robotics community. LiDAR and camera [27, 8, 14, 41, 5, 22] are the representative sensors to enable state estimation. LiDAR is suitable for standard-size aerial vehicles due to its heavy weight and high cost. Although camera is more acceptable for MAVs due to its lightweight and high accuracy, it is limited in well-lighted and texture-rich environments, hindering its usage in vision-crippled scenarios, e.g., smoky buildings in firefighting operations.

To ease the limitation of existing solutions, RF-based state estimators have been proposed in that RF signals are highly resilient to visual limitations. Mueller et al. [21] and Liu et al. [15] take advantage of UWB-based ranges to enable state estimation. WINS [40] uses ubiquitous WiFi to estimate angle-of-arrivals (AoAs) upon an onboard antenna array for state estimation. Extensive studies of WiFi indoor localization systems [11, 37, 34] in the networking community also demonstrate its potential in state estimation. These proposals operate correctly where the amplitude and phase of RF signals are available but not in the presence of extremely weak signals that are far below the noise floor.

Recently, communications with low-power signals in long-range or occlusive settings have been studied in [31, 25, 33]. The signal characteristic and the processing method enable the localizability with such low-power signals. locate [23] is the first localization system that extracts the channel phases of low-power CSS signals drowned by noise to localize targets by range estimates. It operates correctly in semi-stationary scenarios but not in the presence of agile mobilities of MAVs. The fundamental difference in our context is that the backscattered CSS signals have Doppler frequency shifts. Moreover, locate only addresses location and requires the floor plan of the work space to localize targets with a single access point (AP). In contrast, Marvel designs novel algorithms to 1) localize a MAV with its single controller without any prior knowledge of the work space; 2) track the MAV rotation via low-power backscattered CSS signals; 3) enable accurate state estimation by a backscatter-inertial super-accuracy algorithm with the aid of IMU. In addition, there are also works devoted to aided inertial sensing [15, 27, 18], but they are much different from our work due to the fundamentally different sensing modality – backscatter-based pose sensing – formulating a new measurement model to fuse with IMU.

## Vii Conclusion

To our knowledge, Marvel represents the first RF backscatter-based MAV state estimation system that works in long range or through wall with low-power signals drowned by noise. It marks a new sensing modality that complements existing visual solutions in supporting MAV navigation. The system is powered by a backscatter-based pose sensing module that estimates pose features via backscattered CSS signals as well as a backscatter-inertial super-accuracy algorithm that leverages IMU for accurate state estimation. We implement Marvel on USRP and the DJI Matrice 100 platform with customized backscatter tags. The experimental results based on three flight trajectories in both outdoors and indoors show that Marvel holds the promise as a long-range/through-wall, lightweight and plug-and-play state estimation system for MAVs. In future, we plan to seamlessly combine visual sensing and low-power RF sensing to achieve a more robust state estimation system.

### References

1. S. Agarwal and K. Mierle Ceres solver. Note: \urlhttp://ceres-solver.org Cited by: §IV-B.
2. H. Chao, Y. Cao and Y. Chen (2010) Autopilots for small unmanned aerial vehicles: a survey. Int. J. Control. Autom. 8 (1), pp. 36–44. Cited by: §I, §I, §V-C.
3. A. Dhekne, A. Chakraborty, K. Sundaresan and S. Rangarajan (2019) TrackIO: tracking first responders inside-out. In Proc. USENIX NSDI, Cited by: §I.
4. E. Dong, J. Xu, C. Wu, Y. Liu and Z. Yang (2019) Pair-navi: peer-to-peer indoor navigation with mobile visual slam. In Proc. IEEE INFOCOM, Cited by: §I.
5. R. Dubé, A. Gawel, H. Sommer, J. Nieto, R. Siegwart and C. Cadena (2017) An online multi-robot SLAM system for 3D LiDARs. In Proc. IEEE IROS, Cited by: §VI.
6. J. Farrell (2008) Aided navigation: GPS with high rate sensors. McGraw-Hill, Inc.. Cited by: §I, §I, §V-C.
7. J. Fichtinger, J. M. Ries, E. H. Grosse and P. Baker (2015) Assessing the environmental impact of integrated inventory and warehouse management. Int. J. Prod. Econ. 170, pp. 717–729. Cited by: §I.
8. W. Hess, D. Kohler, H. Rapp and D. Andor (2016) Real-time loop closure in 2D LiDAR SLAM. In Proc. IEEE ICRA, Cited by: §VI.
9. C. Jiang, Y. He, S. Yang, J. Guo and Y. Liu (2019) 3D-omnitrack: 3d tracking with cots rfid systems. In Proc. ACM IPSN, Cited by: §I, §I.
10. M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard and F. Dellaert (2012) ISAM2: incremental smoothing and mapping using the bayes tree. Int. J. Robot. Res. 31 (2), pp. 216–235. Cited by: §IV-A.
11. M. Kotaru, K. Joshi, D. Bharadia and S. Katti (2015) Spotfi: decimeter level localization using wifi. In Proc. ACM SIGCOMM, Cited by: §III-C, §V-C, §VI.
12. S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart and P. Furgale (2015) Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34 (3), pp. 314–334. Cited by: §IV-A.
13. J. C. Liando, A. Gamage, A. W. Tengourtius and M. Li (2019) Known and unknown facts of lora: experiences from a large-scale measurement study. ACM Trans. Sens. Netw. 15 (2), pp. 16. Cited by: §III-A.
14. Y. Lin, F. Gao, T. Qin, W. Gao, T. Liu, W. Wu, Z. Yang and S. Shen (2018) Autonomous aerial navigation using monocular visual-inertial fusion. J. Field Robot. 35 (1), pp. 23–51. Cited by: §I, §IV, §V-C, §VI.
15. R. Liu, C. Yuen, T. Do, D. Jiao, X. Liu and U. Tan (2017) Cooperative relative positioning of mobile users by fusing imu inertial and uwb ranging information. In Proc. IEEE ICRA, Cited by: §I, §I, §VI, §VI.
16. C. X. Lu, Y. Li, P. Zhao, C. Chen, L. Xie, H. Wen, R. Tan and N. Trigoni (2018) Simultaneous localization and mapping with power network electromagnetic field. In Proc. ACM MobiCom, Cited by: §IV.
17. Z. Luo, Q. Zhang, Y. Ma, M. Singh and F. Adib (2019) 3D backscatter localization for fine-grained robotics. In Proc. USENIX NSDI, Cited by: §I, §I.
18. T. Lupton and S. Sukkarieh (2012) Visual-inertial-aided navigation for high-dynamic motion in built environments without initial conditions. IEEE Trans. Robot. 28 (1), pp. 61–76. Cited by: §IV-B, §VI.
19. Y. Ma, N. Selby and F. Adib (2017) Drone relays for battery-free networks. In Proc. ACM SIGCOMM, Cited by: §I.
20. W. Mao, Z. Zhang, L. Qiu, J. He, Y. Cui and S. Yun (2017) Indoor follow me drone. In Proc. ACM MobiSys, Cited by: §I.
21. M. W. Mueller, M. Hamer and R. D’Andrea (2015) Fusing ultra-wideband range measurements with accelerometers and rate gyroscopes for quadrocopter state estimation. In Proc. IEEE ICRA, Cited by: §VI.
22. R. Mur-Artal, J. M. M. Montiel and J. D. Tardos (2015) ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31 (5), pp. 1147–1163. Cited by: §I, §VI.
23. R. Nandakumar, V. Iyer and S. Gollakota (2018) 3D localization for sub-centimeter sized devices. In Proc. ACM SenSys, Cited by: §V-B, §VI.
24. OptiTrack – motion capture systems. Note: \urlhttps://optitrack.com/ Cited by: §V-C.
25. Y. Peng, L. Shangguan, Y. Hu, Y. Qian, X. Lin, X. Chen, D. Fang and K. Jamieson (2018) PLoRa: a passive long-range data network from ambient lora transmissions. In Proc. ACM SIGCOMM, Cited by: §VI.
26. M. Power (2018) Walmart testing warehouse drones to manage inventory. Note: \urlhttps://www.supplypro.ca/wal-mart-testing-drones-warehouses-manage-inventory/Online; accessed 27 July 2019 Cited by: §I.
27. T. Qin, P. Li and S. Shen (2018) VINS-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34 (4), pp. 1004–1020. Cited by: §I, §IV-B, §V-C, §VI, §VI.
28. P. R. L. Rita F. Fahy and J. L. Molis (2018-06) Firefighter fatalities in the united states-2017. National Fire Protection Association (NFPA). Cited by: §I.
29. L. Shangguan and K. Jamieson (2016) The design and implementation of a mobile rfid tag sorting robot. In Proc. USENIX NSDI, Cited by: §I, §I.
30. Stepper motor with cable. Note: \urlhttps://www.sparkfun.com/products/9238 Cited by: §V-B.
31. V. Talla, M. Hessar, B. Kellogg, A. Najafi, J. R. Smith and S. Gollakota (2017) Lora backscatter: enabling the vision of ubiquitous connectivity. Proc. ACM UbiComp 1 (3), pp. 105. Cited by: §VI.
32. N. Trawny and S. I. Roumeliotis (2005) Indirect kalman filter for 3d attitude estimation. University of Minnesota, Dept. of Comp. Sci. & Eng., Tech. Rep 2, pp. 2005. Cited by: §IV-A.
33. A. Varshney, O. Harms, C. Pérez-Penichet, C. Rohner, F. Hermans and T. Voigt (2017) Lorea: a backscatter architecture that achieves a long communication range. In Proc. ACM SenSys, Cited by: §VI.
34. D. Vasisht, S. Kumar and D. Katabi (2016) Decimeter-level localization with a single wifi access point. In Proc. USENIX NSDI, Cited by: §V-C, §VI.
35. T. Wei and X. Zhang (2016) Gyro in the air: tracking 3d orientation of batteryless internet-of-things. In Proc. ACM MobiCom, Cited by: §I, §I.
36. C. Wu, F. Zhang, Y. Fan and K. J. R. Liu (2019) RF-based inertial measurement. In Proc. ACM SIGCOMM, Cited by: §I, §I.
37. Y. Xie, J. Xiong, M. Li and K. Jamieson (2019) MD-track: leveraging multi-dimensionality in passive indoor wi-fi tracking. Proc. ACM MobiCom. Cited by: §III-C, §V-C, §VI.
38. L. Yang, Y. Chen, X. Li, C. Xiao, M. Li and Y. Liu (2014) Tagoram: real-time tracking of mobile rfid tags to high precision using cots devices. In Proc. ACM MobiCom, Cited by: §I, §I.
39. Y. Yang, Z. Hu, K. Bian and L. Song (2019) ImgSensingNet: uav vision guided aerial-ground air quality sensing system. In Proc. IEEE INFOCOM, Cited by: §I.
40. S. Zhang, S. Tang, W. Wang and T. Jiang (2018) WINS: wifi-inertial indoor state estimation for mavs. In Proc. ACM SenSys, Cited by: §I, §I, §VI.
41. A. Z. Zhu, N. Atanasov and K. Daniilidis (2017) Event-based visual inertial odometry. In Proc. IEEE CVPR, Cited by: §I, §V-C, §VI.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters