RF Backscatterbased State Estimation for Micro Aerial Vehicles
Abstract
The advances in compact and agile micro aerial vehicles (MAVs) have shown great potential in replacing human for laborintensive or dangerous indoor investigation, such as warehouse management and fire rescue. However, the design of a state estimation system that enables autonomous flight in such dim or smoky environments presents a conundrum: conventional GPS or computer vision based solutions only work in outdoors or welllighted texturerich environments. This paper takes the first step to overcome this hurdle by proposing Marvel, a lightweight RF backscatterbased state estimation system for MAVs in indoors. Marvel is nonintrusive to commercial MAVs by attaching backscatter tags to their landing gears without internal hardware modifications, and works in a plugandplay fashion that does not require any infrastructure deployment, pretrained signatures, or even without knowing the controller’s location. The enabling techniques are a new backscatterbased pose sensing module and a novel backscatterinertial superaccuracy state estimation algorithm. We demonstrate our design by programming a commercialofftheshelf MAV to autonomously fly in different trajectories. The results show that Marvel supports navigation within a range of m or through three concrete walls, with an accuracy of cm for localization and for orientation estimation, outperforming commercial GPSbased approaches in outdoors.
I Introduction
Over the last decade, the rapid proliferation of micro aerial vehicles (MAV) technologies has shown great potential in replacing human for laborintensive or even dangerous indoor investigation and search, such as warehouse inventory management and fire rescue [19, 20, 39, 14, 3]. Specifically, using MAVs to manage inventory for warehouses cuts inventory checks from one month down to a single day [26], and using MAVs for search and rescue in firefighting operations saves the lives of firefighters by the fact that of deaths of the firefighters in the United States occurred in burning buildings in 2017 [28]. These applications require MAVs navigating autonomously in dim warehouses [7] or smoky buildings while reporting to a server or controller at a distance or through walls.
State estimation is fundamental to the autonomous navigation of MAVs. The state, including position, velocity, and orientation, is the key to the flight control system of an aerial vehicle that adjusts the rotating speed of rotors to achieve desired actions for responding remote control or autonomous operations. The mainstream uses GPS, compass and vision sensors to estimate a MAV’s state. However, GPScompass based approaches [6, 2] only work in outdoor free space since GPS signals can be blocked by occlusions and compass measurements are easily distorted by surrounding environments. In indoors, computer vision (CV) based approaches have attracted much attentions due to their lightweight, high accuracy, and low cost, while limited to good lighting or texturerich environments [41, 4, 27, 22], thereby failing to work in dim warehouses or smoky fire buildings.
Recent years have witnessed much progress in using RF signals to track a target’s pose (position and orientation), holding the potential to state estimation that is highly resilient to visual limitations. Despite novel systems that have led to high accuracy [15, 40, 17, 9, 35, 38, 36, 29], hardly any of these ideas have made it into the scenario of indoor MAVs that needs the following requirements:

Long range/through wall. To scan items across a warehouse or navigate in a fire building, the system should support the navigation at least across rooms or over an area of tens of meters.

Lightweight. As a MAV is typically compact with limited battery capacity, it requires a lightweight, smallsized, and lowpower sensing modality to enable state estimation.

Plugandplay. To make it practical to emergency rescue and efficient indoor investigation, the system should be instantly operational in an unknown environment without prior infrastructure deployment, pretraining process, or laborintensive setup.
Specifically, RFIDbased solutions [17, 9, 35, 38, 29] are lightweight while their operational range is limited. UWB and WiFi based solutions [15, 40, 36] have better operational range while requiring predeploying multiple anchors to enable the localizability, failing to meet the plugandplay requirement.
In this paper, we present Marvel, a state estimation system for MAVs that satisfies all these requirements. As shown in Fig. 1, Marvel is able to support the navigation in burning buildings with smoke and fog. It allows a MAV to safely fly across rooms or in an area of several tens of meters. The system is lightweight in that it merely attaches a few backscatter tags to the landing gear of a commercial MAV without any internal hardware modification. Marvel works in a plugandplay fashion that only requires the MAV’s single controller, whose location is unknown, to interact with the MAV and the onboard backscatter tags. The design of Marvel is structured around two components:
(a) Backscatterbased pose sensing: Marvel’s first component enables a backscatterbased sensing modality that allows it to estimate the response of the attached tags over backscattered signals that are drowned by noise. This sensing modality leverages chirp spread spectrum (CSS) signals to enable the pose tracking in long range or through occlusions where the signal amplitudes and phases are not available. It introduces a set of algorithms that first estimate channel phases of the tags’ backscatter signals under mobility and then use the phases to estimate pose features, including the range, angle and rotation of the MAV to its controller. This component enables Marvel to operate across rooms or in an area of several tens of meters.
(b) Backscatterinertial superaccuracy state estimation: The estimates of the first component cannot fulfil the state estimation, i.e., computing a MAV’s position, velocity, and orientation. Marvel’s second innovation is a backscatterinertial superaccuracy algorithm that combines the backscatterbased estimates with the onboard IMU measurements to enable accurate state estimation. Despite that IMU suffers in error accumulation, our backscatter sensing is driftfree, i.e., no temporal error accumulation, being able to correct the IMU drift by multisensor fusion. It employs a graphbased optimization framework to compute a Gaussian approximation of the posterior over the MAV trajectory. This involves computing the mean of this Gaussian as the configuration of the state that maximizes the likelihood of the sensor observations.
Results. We build a prototype of Marvel using a DJI M100 MAV attached with four LoRa backscatters customized by offtheshelf components. We demonstrate Marvel’s potential by programming the MAV to autonomously fly in different trajectories in a longrange open space and an acrossroom indoor test site. The results show that Marvel can support the navigation within a range of m or with three concrete walls blocked the vehicle to its controller and achieves an average accuracy of cm for localization and for orientation estimation, outperforming commercial GPSbased navigation approaches [6, 2].
Contributions. We introduce a novel backscatterbased pose sensing technique that first extracts the channel phases of CSS signals under mobility and then estimates pose features of a MAV by the phases. This enables pose tracking in long range or through occlusions. In addition, we design a backscatterinertial superaccuracy algorithm that fuses backscatterbased estimates and IMU measurements to enable accurate state estimation. Finally, we implement a prototype and conduct realworld experiments demonstrating the system’s ability to navigate a MAV across rooms or in an area of several tens of meters.
Ii System Overview
Marvel is a system that enables state estimation to support a MAV’s navigation over an open area of tens of meters or across rooms in indoors. It works with the MAV’s controller and four LoRa backscatter tags attached on the landing gear. As shown in Fig. 2, the controller consists of a data handler and a backscatter signal handler. The data handler sends and receives data to and from the MAV by one antenna. It excites the tags and exchanges data like channel phases and velocities via CSS signals of LoRa. The backscatter signal handler receives the backscattered signals by a linear array of three antennas. It extracts the channel phases and then sends them to the MAV. The computer on the MAV takes the phases to compute the state. It then sends the state to the flight control system that computes commands to adjust actions and navigates the MAV to destination.
Marvel consists of two components:

Backscatterbased pose sensing module contains chirpbased estimation approaches that first estimate channel phases and then use the phases to compute the range, angle and rotation of the MAV to its controller, enabling pose tracking in long range or through occlusions.

Backscatterinertial superaccuracy state estimation algorithm fuses the estimates of the backscatterbased sensing with the IMU measurements, which include 3D accelerations and angular velocities, through a graphbased optimization framework. It models each module’s estimates as a Gaussian mixture and computes a Gaussian approximation of the posterior over the MAV trajectory.
Iii Backscatterbased Pose Sensing
Iiia Primer on CSS Processing
Before proceeding to our design, we first briefly introduce the CSS signals of LoRa that we use. The controller transmits a linear upchirp signal with a bandwidth to the MAV with backscatter tags. A tag backscatters the signal with a frequency shift for preventing the interference from the excitation signal. Multiple tags perform different frequency shifts. Marvel uses the chirp signal that is compatible with LoRa protocol. Its duration depends on spreading factor () and bandwidth [13], i.e., , where and the maximum is KHz.
To decode the chirp, the receiver first multiples the received signal with a synthesized downchirp whose frequency linearly varies from to . Then, it takes a fast Fourier transform (FFT) on this multiplication (Fig. 2). This operation sums the energy across all the frequencies of the chirp, producing a peak at an FTT bin.
Next, we describe how the controller extracts the channel phase of the received chirp from the peak. Then, we elaborate on how Marvel leverages the phase to estimate its position and rotation.
IiiB BelowNoise Channel Phase Extraction
Since MAVs are expect to carry out emergency tasks like fire rescue, the system desires the localizability with a single anchor (its controller) and without prior knowledge of the work space, being instantly deployable and operable wherever required. The position of a target referring to a single anchor can be represented by the angle and the range of the target to the anchor as polar coordinates. And both the parameters can be inferred by the channel phase of the signal.
The channel phase extraction for chirp signals has two steps as shown in Fig. 3: we first combat the Doppler effect to estimate the beginning of the chirp and then we extract the channel phase leveraging the linearity of the chirp frequencies.
The channel phase extraction for chirp signals has two steps: we first combat the Doppler effect to estimate the beginning of the chirp and then we extract the channel phase leveraging the linearity of the chirp frequencies.
To estimate the beginning of the chirp, we leverage a key property of the chirp signal: a time delay in the chirp signal translates to frequency shift. Ideally, decoding the original upchirp with a downchirp produces a peak in the first FFT bin (see Fig. 2). When a tag is separated from the controller, the backscatter signal handler receives the signal with a timing offset of the signal’s round trip. The peak appears in the shifted bin . If we move the beginning of the received chirp samples closer to its real beginning and repeat the decoding operation, there will be a new peak at the first FFT bin again and the symbol at this instant is the beginning of the transmission. However, under the MAV’s mobility, the signal additionally experiences the Doppler frequency shift. The shifted bin is a mixed result of the timing offset and the Doppler effect. The above operation can no longer recover the beginning of the chirp.
Our solution leverages the kinetics and the structure of a MAV. We attach four backscatters on the landing gear of a MAV. As shown in Fig. 4, the Doppler frequency shift of a tag, e.g., tag , is a combinatorial result of translation and rotation. The shift can be expressed as,
(1) 
where is the unit vector that represents the direction from the MAV to the controller, the carrier frequency, the speed of RF signals in the medium. and are the translational velocity and the rotational velocity. and corresponds to the translational shift and the rotational shift. To estimate the beginning of the chirp, we need to isolate the frequency shift translated from the timing offset by eliminating the effect of Doppler shift.
Eliminating the effect of Doppler shift. We first eliminate the effect of the rotational shift by the key observation that any pair of opposing tags on the landing gear, e.g., tags and in Fig. 4, always have rotational velocities with the same magnitude but opposite directions and all tags share the same translational velocity. Thus, averaging the shifted peak of two opposing tags eliminates the rotational shift as shown in Fig. 5. Specifically, decoding the backscattered signals from a pair of opposing tags, we obtain the FFT bin indices, and ,
(2) 
where and are the frequency shift translated by the timing offset, and the translation shift and the rotational shift of tag . Note that since their maximal difference is the translated shift from the traveling time of the distance between a pair of opposing tags, i.e., the diameter of the MAV, which is negligible as ( cm for the DJI M100) is too small for the speed of RF signal propagation. Thus, averaging them, i.e., , eliminates the rotational shift.
Since the two pairs of tags on the MAV are structurally symmetric, when we perform the above operation to each pair, the results are expected to be identical. However, they exhibit a slight difference as shown in Fig. 5. This is because the microcontrollers of the tags are not synchronized with the controller, it introduces an additional carrier frequency offset (CFO) for each tag, which is a constant. In our approach, is the difference of CFOs upon averaging the two pairs of tags, which is still a constant. We can simply apply this to the rest of the transmission to estimate the right chirp phase.
Now we eliminate the translational shift to isolate the frequency shift translated from the timing offset. Then, we can obtain the signal at the real beginning of the transmission by moving the beginning of the received chirp samples. can be tracked using the accelerations measured by the onboard IMU. Initially, the MAV is about to take off. At this initial stage, there is no motion, is already the frequency shift . Thus, the channel phase can be obtained according to the workflow (Fig. 3). Then, we specify in Eqn. (1) by our angle estimation algorithm in § IIIC. When the MAV takes off, the accelerations measured by IMU can track the translational velocity . Thus, and . Note that integrating the accelerations to obtain the velocity will suffer from the temporal drift. The superaccuracy algorithm in § IV corrects the drift and feeds back to the controller.
Extracting channel phase. At this stage, we have corrected the signal to the symbol at the beginning of the transmission. Now we compute the channel phases of all frequencies in the chirp. Taking an FFT to the multiplication of the corrected upchirp and the downchirp adds the phase across all the frequencies in the chirp. Assume that a linear chirp has frequencies and the channel path of the signal remains constant throughout the duration of the chirp, then the phase of the received chirp changes linearly with the frequency. We have
(3) 
where are explicitly defined when generating the chirp signal. Solving the above equation obtains the channel phases of all frequencies in the chirp.
Note that this method requires a short chirp duration to be within the channel coherent time. Meanwhile, the signal needs good decoding capability, which is proportional to the product of signal duration and bandwidth. To this end, we choose the parameters of CSS signals that conform to LoRa standard as , KHz, and thus the chirp duration is ms.
IiiC BelowNoise Pose Sensing
So far we have obtained the channel phases of backscatter signals. We now use them to estimate the pose feature, including the range and angle of a MAV to its controller for positioning, and the MAV’s rotation for determining the orientation.
Range estimation. Assume that the controller is separated from a tag on the MAV by a distance of . A linear chirp signal with frequencies transmitted by the controller propagates a total distance of for the round trip to and from the tag. The wireless channel of such a signal is,
(4) 
where is the attenuation corresponding to frequency in the chirp, . In the absence of multipath, we can use the obtained channel phases of the backscatter signal to estimate the range . However, due to multipath, the obtained phases is actually the sum of phases of the directpath signal and the multipathreflected signals.
To combat multipath while conforming to LoRa protocol, we send multiple chirps in the channels of MHz band and combine the phase information across all these channels to simulate a wideband transmission. At a high level, a wideband signal can be used to disambiguate the multipath. There are channels separated by MHz with respect to the adjacent channels. We have four tags on the MAV which are configured to different frequency shifts for preventing the interference from the excitation signal. So, the controller can transmit excitation signals in channels and receive backscatter signals across channels. By combining them, the controller sends the phases at all the channels to the MAV through LoRa. Then, the MAV computes the range estimate by using an inverse FFT on the phases to get the timedomain multipath profile. We use a fixed energy threshold over this profile to identify the closest (most direct) path from the MAV.
Angle estimation. The angle of incident signals is also encoded in the phases of the signals. The backscattered chirp signal received by a linear array with antennas from propagation paths has the measurement matrix ,
(5)  
where , denotes the attenuation factors of paths at frequency in the chirp, the attenuation factor of path at frequency . is the steering matrix where denotes the steering vector of path , and the constant where is the antenna spacing. is the angle of interest. We can see that the angle only exists in the steering matrix, contributing the phases in the complex elements of matrix .
Thus, even without the attenuation information, we can use the obtained phases to construct a virtual measurement matrix of which all complex elements have unit attenuation with the phases of frequencies in the chirp to allow the angle estimation. The virtual measurement matrix can be written as
(6) 
where denotes the phase of antenna at frequency . Applying to the superresolution angle estimation technique [11], we obtain the directpath angle of a tag to the controller. The four tags provide four angles for every chirp. We compute the harmonic mean of the four angles as the final result.
By far, we model the angle estimation in 2D case (Eqn. (5)) for ease of presentation. It is trivial to be extended to 3D case. Specifically, the 3D angle is represented by two parameters, azimuth angle and elevation angle . It can be expressed by a vector . Since the linear array is only capable of determining one parameter, we change it to a 2D circular array with the three antennas. This will slightly change the steering vector in Eqn. (5). Then, we can jointly search the two parameters in a very efficient way [37] to estimate the 3D angle. Or we can save the twodimensional search using a barometer, which is a common sensor equipped on MAVs, to measure the MAV height . With the range between the MAV and its controller, the elevation angle is fixed by . Then, the angle estimation in 3D reduces to the problem in 2D case described above.
Rotation estimation. The real problem to determine a MAV’s orientation is how to anchor the yaw, a.k.a., heading. The orientation can be represented by Euler angles: roll , pitch , and yaw for a rotation around , , and axes (Fig. 4). And it can be computed by integrating the 3D angular velocity readings from the onboard IMU. The results however suffer from temporal drifts due to the inherent noise of IMU. Thanks to IMU that it drifts in four degrees of freedom, which includes 3D position and yaw angle. When equipping an IMU on an aerial vehicle, the yaw typically indicates the heading of the vehicle. Therefore, we need driftfree rotation estimates to fix the heading.
Our idea is that the rotational shift is solely determined by the rotation. We can use it to map the rotation. According to Eqn. (2), subtracting the indices of the peaks from two opposing tags and gives the rotational frequency shift,
(7) 
Now we model the rotational shift. We denote the angle of the MAV to its controller as and the MAV’s rotation as (refer to Fig. 4), then , where is the angular velocity during the rotation. The rotational shift can be expressed as
(8) 
The controller computes by Eqn. (7) and sends it to the MAV. can be obtained by the angle estimation algorithm. The gyroscope in IMU measures angular velocity . The rest parameters are known constants. Thus, rotation can be solved by Eqn. (8). From this model, we can see that when , the frequency shift is . This is when the direction of the rotational velocity is orthogonal to the direction of the MAV to the controller. In this case, it poses an ambiguity of the rotation that could be or . It is trivial to eliminate this ambiguity by using the measurements from the other pair of tags on the landing gear, which is an orthogonal structure.
Iv Backscatterinertial Superaccuracy State Estimation
So far, we have discussed how Marvel measures the pose feature, including the range, angle and rotation of a MAV to its controller, based on backscattered CSS signals. In this section, we enable accurate state estimation by taking advantage of the onboard IMU to jointly optimize the state with the backscatterbased pose sensing.
Since the controller’s location is unknown, solving the state estimation problem consists of estimating the MAV state over its trajectory and the controller’s location. The controller is essentially a key feature to the map of the environment in which the MAV moves. This falls into the simultaneous localization and mapping (SLAM) problem domain. Solutions to the SLAM problem can be either filteringbased or graphbased approaches. Filteringbased approaches are considered to be more efficient in computation as they only estimate the current robot state and the map. Their main drawback is that fixing the linearization points early may lead to suboptimal results. On the contrary, graphbased approaches can achieve better performance via repetitively linearizing past robot states and multiview constraints [14, 16]. We employ a graphbased optimization framework to solve our state estimation problem.
Iva Problem Formulation
The graph representation of our state estimation problem is shown in Fig. 6. Let denote the state at time . At each , the MAV observes a set of backscatter sensing measurements which include range , angle and yaw rotation . Typically, the IMU data rate ( Hz) is much higher than the data rate of the backscatter sensing ( Hz). Thus, there have been buffered multiple IMU measurements between two states. is an integrated result over these measurements (specified in § IVB) that represents the odometry between two consecutive states, e.g., and .
To achieve realtime processing, we employ an incremental state update scheme [10] that takes IMU and backscatterbased measurements in a fixed time interval for state estimation. As long as a new state with its backscatterbased measurements is available, our approach works in a sliding window fashion that incorporates the new state and removes the oldest state. The full state vector within the interval is defined as,
(9)  
where denotes ^{th}state in the window, which contains position , velocity , and rotation with respect to the first (^{th}) state. is the Hamilton quaternion [32] representation of the rotation. We use the quaternion representation for modelling the odometry as a vector. is the number of states in the sliding window. denotes the position of the controller.
Based on the state vector, we minimize the Mahalanobis norm of all measurement residuals to obtain a maximum a posteriori estimation:
(10) 
where and are measurement residuals for LoRa backscatter and IMU, respectively. is the set of backscatterbased pose features and denotes the set of IMU measurements. We choose the Mahalanobis norm to be the optimization objective because it takes into account the correlations of the data set. These correlations amongst internal states of different sensing modalities are key for any highprecision inertialbased autonomous system [12].
IvB Backscatterinertial State Estimation
We now solve the nonlinear system (10) for state estimation via the GaussNewton algorithm. This involves linearizing the nonlinear system by the first order Taylor expansion of the residuals in (10) around a reasonable initial guess. Since the backscatterbased pose sensing provides ranges, it is easy to obtain an initial guess of the state. The objective is to minimize the sum of the Mahalanobis norm of backscatter sensing and IMU residuals. The Mahalanobis norm in (10) can be explicitly expressed as,
(11)  
Here we briefly denote and as and . Next, we define the residuals and their corresponding covariance matrices.
Backscatter sensing residual. Our backscatterbased pose sensing gives range , angle , and yaw rotation . Rotation can be trivially derived by with driftfree roll and pitch provided by IMU. Given these measurements, the residual is defined as,
(12) 
where extracts the vector part of the quaternion, which is the approximation of the errorstate representation. is the 3D errorstate representation of quaternion. The covariance matrix is the measurement noise matrix, which can be estimated by statistically analyzing the pose features.
IMU residual. Based on the kinematics, the residual of IMU measurements can be defined as,
(13) 
where is the preintegrated result [18] using accelerations and angular velocities , which are raw measurements provided by IMU at time within the time interval between two consecutive states. Specifically, , , where denotes the quaternion multiplication operation. the conversion from the quaternion to the rotation matrix. The covariance can be computed recursively by firstorder discretetime propagation within , referring to [27] for more details.
V Implementation and Evaluation
Va Implementation and Evaluation Methodology
The controller is built by two colocated NI USRP2943 nodes, each with a UBX160 daughterboard. They have four channels to be configured as a data handler with one antenna and a backscatter signal handler with three antennas. We configure USRP to work on MHz band and all signal parameters conform to LoRa standard. The three antennas for the backscatter signal handler are mounted to an acrylic pole separated by a distance of cm. The USRP nodes are synchronized using an external clock and frequency reference. We run the CSS decoding and the channel phase extraction on the controller.
The MAV system is built by attaching an Intel NUC, a LORD MicroStrain 3DMGX445 IMU, and an SX1276MB1LAS longrange transceiver on the DJI Matrice 100. In addition, there are four customized LoRa backscatter tags attached on the landing gear of the MAV. The backscatter uses the ADG919 and ADG904 RF switches to enable backscatter communications. The four backscatters are controlled by an Altera STEPMAX10 FPGA. It configured them to shift MHz frequency with each other when backscattering the linear chirps with KHz bandwidth. We run Marvel on the Intel NUC with a 1.3 GHz Core i5 processor with cores, an GB RAM and a GB SSD, running Ubuntu Linux. The backscatterbased pose sensing module and the backscatterinertial superaccuracy state estimation algorithm are written in C++. We use Robot Operating System (ROS) to be the interfacing robotics middleware. The experimental platform is shown in Figure 7.
We conduct experiments in both outdoors and indoors for the evaluations in longrange and throughwall settings. The outdoor experiments are conducted in an open field in front of an office building. There is no obstacle between the MAV and the controller. The indoor experiments are conducted in a MAV test site of square meters. The site is located on the basement level of an office building as shown in Fig. 9. Multiple rooms are separated by concrete walls, drywall, and wooden doors and have office furniture including tables, chairs, and computers. In the following, we first examine Marvel’s submodule accuracy and latency in microbenchmark evaluation, and then we evaluate its systemlevel performance under different motions and environments.
VB Microbenchmark Evaluation
We evaluate the performance of positioning and rotation estimation, respectively. To evaluate the positioning approach, we build a sliding rail by the stepper motor ROB09238 [30] that supports the moving with a controllable speed. We place the MAV on a plate mounted on this rail. To evaluate the rotation estimation, we place the MAV on a plate mounted on the stepper motor and control the rotating speed. In longrange experiments, we place the controller at one end of the field and move the MAV away from the controller in increments of m. In throughwall experiments, we place the MAV in the test site and move the controller to different rooms (Fig. 9). There are three concrete walls between the controller and the MAV at location . At each location, we repeat the experiment multiple times and compute the errors.
Positioning accuracy. We first validate the positioning capability of Marvel in different speeds. We compare Marvel with the stateoftheart CSSbased localization system, locate [23], which operates correctly in semistationary scenarios. As shown in Fig. 9, the accuracies of the two approaches are similar in stationary case, whose mean error is around m. However, the error of locate scales with the speed since its channel phase estimates are distorted by the Doppler frequency shift. Its position error reaches m in the worse case while Marvel’s accuracy keeps steady.
The positioning results in different settings are shown in Fig. 11. To demonstrate that our approach is resilient to the Doppler effect under mobility, we move the MAV along the rail in a speed of m/s, which is the maximum speed allowed.
The longrange result shows that the error scales with the MAVcontroller distance. Specifically, the position error of m at a distance of m, which increases to m at a distance of m. This further increases to m at a distance of m. This is due to the fact that the angle estimate with limited accuracy maps to a growing uncertain area of the MAV’s position with the increasing distance. Our customized backscatter works at most m at which the worst case position accuracy is m. Beyond that distance, the power of the received signal is too low to decode even with the CSS coding.
The throughwall result shows that the accuracies at different locations are similar because the MAVcontroller distance does not vary much. But the accuracy in indoors is worse than at a distance of m in the open space due to the multipath fading. The worst case accuracy at location where has three walls blocking the MAV and the controller is m. Our controller is unable to receive the backscatter signal when it goes through more than three walls.
In summary, the position accuracy is limited to meter level in both outdoors and indoors due to the limited signal bandwidth at the MHz band that we use. Nevertheless, with the aid of IMU, Marvel achieves decimeterlevel accuracy as shown in § VC.
Rotation estimation accuracy. We evaluate the rotation estimation by controlling the stepper motor whose angular velocity starts from rad/s and increases by the rate rad/s until rad/s, and then decreases by the same rate to be back at rad/s. The whole process takes seconds as shown in Fig. 11. We repeat the experiment times and analyze the data. As expected, the result in the throughwall setting is worse (mean error , standard deviation ) than the other (mean error , standard deviation ) due to the larger error of angle estimation in the presence of multipath. Fig. 11 also shows that our rotation estimation algorithm succeeds in closely tracking the MAV’s rotation with varying angular velocities in both settings, providing driftfree results.
VC Systemlevel State Estimation
We program the MAV to fly in different trajectories for evaluating the overall performance of Marvel in various motional patterns. The ground truth of the flight trajectories is provided by OptiTrack [24]. The maximum linear velocity reaches m/s in this experiment. We skip the plot of linear velocity estimation as it is the first derivative of position and thus it is highly related to the position accuracy.
In longrange experiments, the MAV flied in a circular trajectory. Since the backscattered signal cannot be decoded when the distance is longer than m, the controller is placed m away from the MAV before taking off to ensure that the MAV cannot go beyond the distance limitation during the flight. As shown in Fig. 12, the average error of state estimation is cm for positioning and for orientation estimation. This demonstrates that the superaccuracy algorithm significantly increases the accuracy of pose tracking, enabling accurate state estimation.
In throughwall experiments, for safety reasons, the MAV has to fly in the test site. We placed the controller at location and the MAV flied in a square trajectory due to the limited area. As shown in Fig. 13, the average position error over the trajectory is cm and the average orientation error is . The accuracy is slightly worse than in the open field due to the multipath fading and the more aggressive motions around the corners of the square trajectory.
Overall, the performance is better than GPSbased state estimators [6, 2] (meterlevel accuracy), which are only applicable in outdoors. In addition, the position accuracy is also better than the stateoftheart WiFi indoor localization systems [11, 37, 34], which have to use an active WiFi radio and cannot work in longrange or throughwall settings. Note that Marvel is not designed to defeat CVbased approaches [27, 14, 41] in accuracy. They are more accurate when environments are welllighted and texturerich. But Marvel complements them to support MAV navigation in visioncrippled scenarios.
Vi Related Work
State estimation for aerial vehicles has been a long studied problem in the robotics community. LiDAR and camera [27, 8, 14, 41, 5, 22] are the representative sensors to enable state estimation. LiDAR is suitable for standardsize aerial vehicles due to its heavy weight and high cost. Although camera is more acceptable for MAVs due to its lightweight and high accuracy, it is limited in welllighted and texturerich environments, hindering its usage in visioncrippled scenarios, e.g., smoky buildings in firefighting operations.
To ease the limitation of existing solutions, RFbased state estimators have been proposed in that RF signals are highly resilient to visual limitations. Mueller et al. [21] and Liu et al. [15] take advantage of UWBbased ranges to enable state estimation. WINS [40] uses ubiquitous WiFi to estimate angleofarrivals (AoAs) upon an onboard antenna array for state estimation. Extensive studies of WiFi indoor localization systems [11, 37, 34] in the networking community also demonstrate its potential in state estimation. These proposals operate correctly where the amplitude and phase of RF signals are available but not in the presence of extremely weak signals that are far below the noise floor.
Recently, communications with lowpower signals in longrange or occlusive settings have been studied in [31, 25, 33]. The signal characteristic and the processing method enable the localizability with such lowpower signals. locate [23] is the first localization system that extracts the channel phases of lowpower CSS signals drowned by noise to localize targets by range estimates. It operates correctly in semistationary scenarios but not in the presence of agile mobilities of MAVs. The fundamental difference in our context is that the backscattered CSS signals have Doppler frequency shifts. Moreover, locate only addresses location and requires the floor plan of the work space to localize targets with a single access point (AP). In contrast, Marvel designs novel algorithms to 1) localize a MAV with its single controller without any prior knowledge of the work space; 2) track the MAV rotation via lowpower backscattered CSS signals; 3) enable accurate state estimation by a backscatterinertial superaccuracy algorithm with the aid of IMU. In addition, there are also works devoted to aided inertial sensing [15, 27, 18], but they are much different from our work due to the fundamentally different sensing modality – backscatterbased pose sensing – formulating a new measurement model to fuse with IMU.
Vii Conclusion
To our knowledge, Marvel represents the first RF backscatterbased MAV state estimation system that works in long range or through wall with lowpower signals drowned by noise. It marks a new sensing modality that complements existing visual solutions in supporting MAV navigation. The system is powered by a backscatterbased pose sensing module that estimates pose features via backscattered CSS signals as well as a backscatterinertial superaccuracy algorithm that leverages IMU for accurate state estimation. We implement Marvel on USRP and the DJI Matrice 100 platform with customized backscatter tags. The experimental results based on three flight trajectories in both outdoors and indoors show that Marvel holds the promise as a longrange/throughwall, lightweight and plugandplay state estimation system for MAVs. In future, we plan to seamlessly combine visual sensing and lowpower RF sensing to achieve a more robust state estimation system.
References
 Ceres solver. Note: \urlhttp://ceressolver.org Cited by: §IVB.
 (2010) Autopilots for small unmanned aerial vehicles: a survey. Int. J. Control. Autom. 8 (1), pp. 36–44. Cited by: §I, §I, §VC.
 (2019) TrackIO: tracking first responders insideout. In Proc. USENIX NSDI, Cited by: §I.
 (2019) Pairnavi: peertopeer indoor navigation with mobile visual slam. In Proc. IEEE INFOCOM, Cited by: §I.
 (2017) An online multirobot SLAM system for 3D LiDARs. In Proc. IEEE IROS, Cited by: §VI.
 (2008) Aided navigation: GPS with high rate sensors. McGrawHill, Inc.. Cited by: §I, §I, §VC.
 (2015) Assessing the environmental impact of integrated inventory and warehouse management. Int. J. Prod. Econ. 170, pp. 717–729. Cited by: §I.
 (2016) Realtime loop closure in 2D LiDAR SLAM. In Proc. IEEE ICRA, Cited by: §VI.
 (2019) 3Domnitrack: 3d tracking with cots rfid systems. In Proc. ACM IPSN, Cited by: §I, §I.
 (2012) ISAM2: incremental smoothing and mapping using the bayes tree. Int. J. Robot. Res. 31 (2), pp. 216–235. Cited by: §IVA.
 (2015) Spotfi: decimeter level localization using wifi. In Proc. ACM SIGCOMM, Cited by: §IIIC, §VC, §VI.
 (2015) Keyframebased visualinertial odometry using nonlinear optimization. Int. J. Robot. Res. 34 (3), pp. 314–334. Cited by: §IVA.
 (2019) Known and unknown facts of lora: experiences from a largescale measurement study. ACM Trans. Sens. Netw. 15 (2), pp. 16. Cited by: §IIIA.
 (2018) Autonomous aerial navigation using monocular visualinertial fusion. J. Field Robot. 35 (1), pp. 23–51. Cited by: §I, §IV, §VC, §VI.
 (2017) Cooperative relative positioning of mobile users by fusing imu inertial and uwb ranging information. In Proc. IEEE ICRA, Cited by: §I, §I, §VI, §VI.
 (2018) Simultaneous localization and mapping with power network electromagnetic field. In Proc. ACM MobiCom, Cited by: §IV.
 (2019) 3D backscatter localization for finegrained robotics. In Proc. USENIX NSDI, Cited by: §I, §I.
 (2012) Visualinertialaided navigation for highdynamic motion in built environments without initial conditions. IEEE Trans. Robot. 28 (1), pp. 61–76. Cited by: §IVB, §VI.
 (2017) Drone relays for batteryfree networks. In Proc. ACM SIGCOMM, Cited by: §I.
 (2017) Indoor follow me drone. In Proc. ACM MobiSys, Cited by: §I.
 (2015) Fusing ultrawideband range measurements with accelerometers and rate gyroscopes for quadrocopter state estimation. In Proc. IEEE ICRA, Cited by: §VI.
 (2015) ORBSLAM: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31 (5), pp. 1147–1163. Cited by: §I, §VI.
 (2018) 3D localization for subcentimeter sized devices. In Proc. ACM SenSys, Cited by: §VB, §VI.
 OptiTrack – motion capture systems. Note: \urlhttps://optitrack.com/ Cited by: §VC.
 (2018) PLoRa: a passive longrange data network from ambient lora transmissions. In Proc. ACM SIGCOMM, Cited by: §VI.
 (2018) Walmart testing warehouse drones to manage inventory. Note: \urlhttps://www.supplypro.ca/walmarttestingdroneswarehousesmanageinventory/Online; accessed 27 July 2019 Cited by: §I.
 (2018) VINSmono: a robust and versatile monocular visualinertial state estimator. IEEE Trans. Robot. 34 (4), pp. 1004–1020. Cited by: §I, §IVB, §VC, §VI, §VI.
 (201806) Firefighter fatalities in the united states2017. National Fire Protection Association (NFPA). Cited by: §I.
 (2016) The design and implementation of a mobile rfid tag sorting robot. In Proc. USENIX NSDI, Cited by: §I, §I.
 Stepper motor with cable. Note: \urlhttps://www.sparkfun.com/products/9238 Cited by: §VB.
 (2017) Lora backscatter: enabling the vision of ubiquitous connectivity. Proc. ACM UbiComp 1 (3), pp. 105. Cited by: §VI.
 (2005) Indirect kalman filter for 3d attitude estimation. University of Minnesota, Dept. of Comp. Sci. & Eng., Tech. Rep 2, pp. 2005. Cited by: §IVA.
 (2017) Lorea: a backscatter architecture that achieves a long communication range. In Proc. ACM SenSys, Cited by: §VI.
 (2016) Decimeterlevel localization with a single wifi access point. In Proc. USENIX NSDI, Cited by: §VC, §VI.
 (2016) Gyro in the air: tracking 3d orientation of batteryless internetofthings. In Proc. ACM MobiCom, Cited by: §I, §I.
 (2019) RFbased inertial measurement. In Proc. ACM SIGCOMM, Cited by: §I, §I.
 (2019) MDtrack: leveraging multidimensionality in passive indoor wifi tracking. Proc. ACM MobiCom. Cited by: §IIIC, §VC, §VI.
 (2014) Tagoram: realtime tracking of mobile rfid tags to high precision using cots devices. In Proc. ACM MobiCom, Cited by: §I, §I.
 (2019) ImgSensingNet: uav vision guided aerialground air quality sensing system. In Proc. IEEE INFOCOM, Cited by: §I.
 (2018) WINS: wifiinertial indoor state estimation for mavs. In Proc. ACM SenSys, Cited by: §I, §I, §VI.
 (2017) Eventbased visual inertial odometry. In Proc. IEEE CVPR, Cited by: §I, §VC, §VI.