A New Hierarchical Software Architecture Towards Safety-Critical Aspects of a Drone System
In this paper, a new hierarchical software architecture is proposed to improve the safety and reliability of a safety-critical drone system from the perspective of its source code. The proposed architecture uses formal verification methods to ensure that the implementation of each module satisfies its expected design specification, so that it prevents a drone from crashing due to unexpected software failures. This work builds on top of a formally verified operating system kernel, CertiKOS(Gu et al., 2015). Since device drivers are considered the most important parts affecting the safety of the drone system, this paper mainly focuses on verifying bus drivers such as the SPI driver and the I2C driver in a drone system using a rigorous formal verification method(Chen et al., 2016). Experiments have been carried out to demonstrate the improvement in reliability in case of device anomalies.
|Frontiers of Information Technology & Electronic Engineering|
|www.jzus.zju.edu.cn; engineering.cae.cn; www.springerlink.com|
|ISSN 2095-9184 (print); ISSN 2095-9230 (online)|
|A New Hierarchical Software Architecture Towards Safety-Critical Aspects of a Drone System11footnotemark: 1|
|E-mail: email@example.com; firstname.lastname@example.org|
††thanks: Corresponding author
††thanks: * Project supported by the National Natural Science Foundation of China (No.91648012) and Shenzhen Science, Technology, and Innovation Comission, China (No. JCYJ20160401100022706)
††thanks: ORCID: Xiao-rui ZHU, http://orcid.org/0000-0003-1400-059X Zhong SHAO, http://orcid.org/0000-0001-8184-7649
††thanks: © Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018
|Key words:||safety-critical; drone; software architecture; formal verification|
|https://doi.org/10.1631/FITEE.1800636||CLC number: V279;TP311.5|
In recent years, small unmanned aerial vehicles (UAV) or drones have drawn more and more attention because of their low cost, compact size. As small UAVs come into our daily life, safety concerns are also rising. Failures of a drone may result in severe damage to the environment and serious injury to the public(Simpson and Stoker, 2006).
Aside from maneuver mistakes, software errors in the controller are also one of the main reasons for UAVs failures. The fault may come from the algorithm itself or its actual implementation(Malecha et al., 2016). A lot of work has been done to improve the reliability of UAV systems. Most efforts have focused on algorithms, such as improving modeling accuracy(Leishman, 2002), enhancing the robustness of control algorithms(Lee et al., 2010), and reducing sensor errors(Marina et al., 2012). In 2013, Réti proposed a hardware solution to improve the safety by developing a smart mini actuator which integrated measurements of position and angular rate with controlling microprocessors(Réti et al., 2013). Few people so far have addressed bugs in the implementation of algorithms at the source code level. For a safety-critical real-time system like an UAV, this negligence could result in problems such as loss of synchronization (caused by irregular response from external sensors) and high approximation errors (caused by floating-point computation)(Malecha et al., 2016). These problems are subtle but might degrade the performance or even cause the drone to crash.
Formal verification is a technique to conduct correctness proof of a program (or the contradiction if the program contains errors) in accurate and well-formed mathematical and logical constructs. It is used to prevent subtle errors in the source code of control systems (Ricketts et al., 2015; Malecha et al., 2016; Bohrer et al., 2018). Preventing such errors would increase the reliability and safety of drone systems. In 2015, foundational verification techniques in the theorem prover Coq, were applied to a quadrotor system to verify the correctness of two shims (saturation blocks) that were used to limit the velocity and height of the quadrotor(Ricketts et al., 2015). In 2016, the same research group verified a runtime monitor in order to provide strong guarantees about maximum velocities and accelerations of a drone(Malecha et al., 2016). In 2018, Brandon Bohrer (Bohrer et al., 2018) designed a verified pipeline for generating concrete controller code from high-level models. However, these efforts for formally verifying control systems are not enough for a hybrid real-time drone system.
Real-time operating system (RTOS) plays an important role in scheduling real-time processes and interacting with devices. Traditional RTOSes, including Nuttx(Nutt, 2007) and FreeRTOS(Barry, 2003), perform well in real-time scheduling. Some of them also support memory protection to improve the security(Wang, 2017). However, none of them has provided a formal correctness proof of its source code.
A potential source of software failures lies in the implementation of device drivers. The driver has to rely on the behavior of that device, for instance, to tell when it is ready to read or write data, or whether a previous write is complete or not. However, due to the complexity of modern hardware, it is difficult to consider all possible abnormal situations when implementing the device driver. For example, it is common for a driver to loop until some status bit on the device is set. If the device does not update this bit in time, then this delays the execution of the driver, and potentially blocks the whole system if the driver runs in the kernel mode and is not interruptible.
The main contribution of this paper includes a new software architecture for improving the reliability and safety of drone systems at the source code level by introducing formal verification techniques. In particular, the proposed architecture is based on CertiKOS (Certified Kit Operating System)(Gu et al., 2015), which enjoys a formal functional correctness guarantee. We adopt this methodology and formally verify the device driver for a drone control system layer by layer, and demonstrate that this indeed improves its safety and reliability. The same architecture could be extended to autonomous cars, home service robots and other safety-critical systems.
This paper is organized as follows. Section 2 describes the proposed software architecture. Section 3 describes the formal verification of driver modules. Experiments and discussions are presented in Section 4.
2 Hierarchical Software Architecture
In order to improve the reliability and safety of the software stack of a drone system, a new hierarchical software architecture is proposed (as shown in Fig. 1). In this architecture, an operating system kernel, CertiKOS(Chen et al., 2016), plays the central role of managing devices such as motors and sensors, and scheduling user tasks such as the control loop, the sensor fusion program, etc.
A Raspberry Pi3 board is equipped on the drone as its main controller, which connects with multiple sensors and actuators through general purpose I/O (GPIO) pins. CertiKOS-ARM, the ARM port of CertiKOS, is installed on the board to manage these devices, either directly or through bus drivers, and to expose them to user space programs. During each control period, the sensor fusion algorithm reads from sensors to generate a reliable attitude estimation. Then the controller decides its next movement and writes control signals to corresponding motors. There is also an RC task which reads the receiver to get control signals from the remote controller. In this way, the reliability of a drone system depends heavily on the correctness of its device drivers.
In the proposed architecture, all software modules including the kernel and device drivers should be formally verified in order to ensure the functional correctness of their source code. CertiKOS has been formally verified on x86 in previous work (Gu et al., 2015), and its implementation has been ported to the ARM architecture successfully. This paper mainly focuses on verification of device drivers for the drone system. It relies on the partially verified CertiKOS-ARM, which includes modules for memory management (verified), thread management (not verified), etc.
3 Driver Verification
In a typical drone control system, it is necessary and important to estimate the drone attitude accurately. Raw data for the drone attitude estimation are usually provided by three sensors: accelerometer, gyroscope and magnetometer. In our system, the accelerometer and gyroscope depend on the Serial Peripheral Interface (SPI) bus to transmit sensing signals, and the magnetometer uses the Inter-Integrated Circuit (I2C) bus.
Following the same methodology as presented in (Chen et al., 2016), the driver verification can be divided into three phases. Firstly, we build a bus model which abstracts machine registers and the physical memory into a state transition system. Afterwards, we define an abstract interface for reading and writing the bus, as shown in Fig. 2.
During the second phase, we divide the C code of the device driver into multiple layers according to their functionalities and dependencies, as shown in Fig. 2. We further convert these individual C functions into their corresponding Clight(Leroy, 2009) abstract syntax tree, so that we can reason about their behaviors by utilizing the Clight semantics (It is actually an extended semantics as detailed in (Gu et al., 2015)). The set of abstract syntax tree implementing a layer is called a module, i.e. in Fig. 2.
Next, we abstract each C function into a Coq function (which is called a specification or a primitive), while still capturing everything we want to know about the behavior of its source code. This is achieved by following the approach of deep specifications (Gu et al., 2015). Then we define invariants for each layer, and prove that all primitives preserve these invariants , so that the higher layer will only operate on valid states of its underlay.
It is not straightforward to prove the refinement between the module and its specification (Highspec) in one step. Hence, we follow (Gu et al., 2015) and introduce the Lowspec to bridge the gap. While Highspec focuses on the abstract states and high-level invariants, the Lowspec deals with the memory state and low-level invariants. The set of Highspecs constitutes the abstract layer of the corresponding module, which is relied upon by other modules. On the other hand, Lowspec is only used for simplifying the refinement proof and hidden from higher layers.
The final phase is the verification of each driver, based on the bus model and abstract bus driver layers obtained from the first two steps. Two refinements have to be proved in each abstract layer(Gu et al., 2015): <1> the refinement from Lowspec to Highspec; <2> each module correctly implements its Lowspec. If any of the conditions are not satisfied, we adjust either the Lowspec or the original C code until all modules are verified. Then the deep specification framework links the verification of all layers together to achieve a verified program. This guarantees the functional correctness of our adjusted C code of device drivers, which in turn contributes to the reliability and safety of the drone system.
3.1 The bus model
The characteristics of the bus in an actual physical system depend on the I/O operations of the CPU and its interactions with the external sensor. Hence, the SPI/I2C bus could be modeled as finite state transition systems interacting with the CPU and external sensors. Different I/O operations or external sensor events lead to different corresponding changes to the state. Bus transitions (i.e. Trans in Fig. 2) therefore include these two types of interactions (Chen et al., 2016).
Notice that the CPU carries out read/write operations on bus registers through the I/O command. We model these operations on both the SPI and I2C bus as in Definition 1.
Definition 1 (CPU Operation on Bus)
denotes reading a value from the register whose address is n. means the CPU writes a value v to the register at address n.
The following subsections describe definitions of the state machine for I2C and SPI, and how they are updated by CPU operations and external actions.
3.1.1 The I2C bus model
In order to formally define the I2C bus model, we first construct its abstract state.
Definition 2 (The I2C bus abstract state)
Although the physical bus hardware is sophisticated and contains many more states and operating modes, most of them are irrelevant regarding the attached sensor, such as the 10-bits addressing mode, high-speed mode, etc. Therefore, we fix its operation to the 7-bit addressing mode, and only abstract 10 registers (Definition 2) to formalize the state of a physical I2C bus. These include the base address of the device interface state I2C_OA and slave address state I2C_SA, which serve as identities when connecting to specific devices. We also model the data receiving buffer state I2C_RX_DATA and the data sending buffer state I2C_TX_DATA to describe the read/write buffer in a real I2C bus, etc.
Definition 3 (I2C state transition function based on CPU operation)
Function describes the interaction between CPU and I2C bus, which takes the CPU operation and the current state as arguments, and returns the resulting state after this operation. A read operation () does not change the I2C state. A write operation () updates the corresponding field in the abstract state to v.
As mentioned previously, besides I/O operations issued by the CPU, external sensor events may also affect the state of I2C bus. There are three kinds of events for I2C bus as listed in Definition 4: non-event, acknowledgment responding event and data receiving event.
Definition 4 (I2C external sensor event)
NullEvent represents a non-event in which the I2C bus is waiting for other functional events. ACKEvent represents the acknowledgment responding event in which the I2C bus receives an acknowledgment. RecvEvent denotes the data receiving event in which the I2C bus receives an integer data val.
Definition 5 (The I2C state transition function based on external sensor events)
The acknowledgment responding event and the non-event will not change the I2C state. For receiving event, the I2C bus receives an integer data val, and copies this value to the register I2C_RX_DATA as shown in the following function:
Notice that in the I2C bus model, an external sensor event list is also constructed to decide the order of all events being processed by the CPU. At the same time, a local event log, Fig. 2, is set up to record events which are already processed in the event list.
Once state transition functions of the I2C bus model are defined, we connect transitions caused by CPU operations with transitions triggered by external events to model the overall effect of reading/writing the I2C bus. And they constitute the interface for the device driver to interact with the I2C bus.
Definition 6 (The I2C bus read semantics)
In Definition 6, we first find the next event e to handle by comparing the event list with local event log , which is denoted by the function . Then, we apply the I2C state transition function on the event e and the current I2C state s to obtain the next I2C state . The next step is to obtain the value res from the abstract state and register address n. Finally, we update the I2C state again through the state transition function . Given all above premises, semantics of reading the I2C bus is defined as: .
Similarly, the following defines the write semantics on the I2C bus.
Definition 7 (The I2C bus write semantics)
This concludes the definition of the I2C bus model, which is relied upon by the verification of device drivers explained in the next subsection.
3.1.2 The SPI bus model
The SPI bus is modeled by the same approach.
Definition 8 (The SPI bus abstract state)
In the SPI Bus model, integer elements SpiRx and SpiTx represent the data receive buffer and data transmit buffer of the actual physical SPI bus, which are abstracted from the data receive register and the transmit register, respectively. The boolean field SpiEn is an abstraction for modeling SPI enabling status. In summary, the SPI bus abstract state contains a total of 25 fields, and they are used in our drone control system.
3.2 Layer structure of the driver code
As mentioned at the beginning of this section, we follow (Chen et al., 2016) and divide the bus driver code into layers based to their functionalities and dependencies to enable the compositional verification. Three principles are followed during this process: <1> similar functions, such as read/write a register, should be put in the same layer; <2> one layer should not contain too many functions, to make the proof easier; <3> such layering should not change the overall behavior of the source code. We show the layer structure of the SPI bus driver, while the layering of the I2C bus driver is similar.
In Fig. 3, each block represents one module in the layer. For example, in the layer DSpiInOut, the module RegRW contains two functions for reading and writing registers. The arrow between two modules indicates the calling relation between them, and a module is only allowed to call modules in the lower layer. For example, the module CH0EN points to (invokes) the module RegRW, and the module RegRW points to the read and write interface of the SPI bus. The blue block indicates that functions in this module depend on at least one function in another module. The white block represents the module which is passed through from a lower layer without any modification. For example, the module RegRW in DSpiEnChannel is passed through directly from the layer DSpiInOut. The module RegRW, which consists of the read and write interface of the SPI bus, is located at the bottom of the layer architecture.
We discuss the verification of these modules in the next section.
3.3 Verification of the driver
In this subsection, we follow the methodology proposed in (Gu et al., 2015) to verify the SPI driver.
3.3.1 Functional correctness of the C code
We show the C code for enabling the channel and its corresponding Clight representation in Fig. 4. The main operation of the function is to write value ENABLE_CHANNEL to the address CH0CTRL in order to enable the SPI bus.
The workflow of proving the functional correctness of a module is elaborated in Fig. 5. Clightgen, provided by Compcert(Leroy, 2009), is used to translate the C code of SPI driver into a Clight abstract syntax tree. Then we write the Highspec and Lowspec of the corresponding module in Coq (see Fig. 5) to establish the refinement relation.
The Highspec describes the desired functionality of this module. For example, the above function mcspi_enable_channel is abstracted as below.
Here, RData contains all states of the system, such as the page table, the process control block, etc. This function only updates spi, and abs.SpiState is an instance of spi. The enable bit(SpiEn) of the SPI bus state(SpiState) will be changed from the previous value to Enable, which describes the behavior of the SPI enable operation in the original C code.
The Lowspec also abstracts the behavior of each function in this module, but is specified in a way that is closer to the concrete hardware. In the case of enabling the SPI bus, it looks very similar to the corresponding Highspec because only function invocation is involved. The following is the low specification of the function mcspi_enable_channel written in Coq:
Here, RData represents the abstract state and mem represents the memory state. Notice that the memory state, m0, does not change since this function does not involve any direct memory operation. Thus, the overall behavior is a transition from
Then we prove the refinement between the Highspec and the Lowspec defined as follows:
The Highspec and Lowspec may operate on different types of states, so that we use , and , to distinguish between the two. However, we establish a relation between two states on these two different levels, which holds only if is a valid abstraction of . This refinement relation states that if the Highspec takes one step from to , and that its initial state is a proper abstraction of , then the corresponding Lowspec must be able to step from to , where the relation also holds between and .
Similarly, we prove the refinement relation between the Lowspec and the actual C code. Combined together, we get the refinement from the Highspec to the actual C code, which is exactly its functional correctness proof.
As shown in Fig. 5, it is possible during the verification process that we find certain refinement relations do not hold. This is either due to a flaw in the specification, which we need to revise and try again, or is indeed caused by a bug in the source code. In the latter case, we have to fix the bug so that the functional correctness of the source code could be verified.
3.3.2 Linking all layers together
The functional correctness proof of each layer assumes the functional correctness of the layer below it. Part of the layer architecture of the SPI bus driver is presented in Fig. 6 to illustrate how we build up the verification layer by layer.
The module CH0EN is first verified, meaning the behavior of its C code indeed follows its specification. It then serves as the interface of layer DSpiEnChannel, which is invoked by layer DSpiSelChannel. Similarly, the layer DSpiSelChannel also exposes the Highspec CH0SELECT as part of its interface, which could be used by upper layers.
The framework we use (Gu et al., 2015) enables us to link layers together and prove the following contextual refinement between layers. Assume that P is a program which uses the function CH0SELECT. As in Fig. 7, the behavior of linking P with the module CH0SELECT (written as ) and running them on the layer DSpiEnChannel is equivalent to the behavior of running program P on the layer DSpiSelChannel (written as PDSpiSelChannel). We can write the refinement between these two executions as follows:
Once this refinement is proved, the actual implementation of the function CH0SELECT is hidden under the layer DSpiSelChannel, while we are still able to reason about all behaviors of program P.
4.1 Methods and procedures
A drone (Fig. 8) was built for all experiments. Three basic sensors, including an accelerometer, a gyroscope and a magnetometer were used to estimate the attitude of the drone. Their configurations are listed in Table 1. A radio telemetry was used to record the flight data. Experiments were designed in this section to simulate erroneous situations or bugs of bus drivers. We set up the system so that bugs occur every 5-10 seconds, whose effect is to delay the execution of the driver code for as long as 0.2 seconds. This simulates the situation when the driver keeps polling for new data without enforcing any timeout mechanism. In this case, an anomaly in the device may block the driver for a long time, which in turn blocks the execution of the whole system.
|Sensor||Chip Name||Measurement Range||Sensitivity||Sampling Rate|
|Accelerometer||MPU9250||8 g||4096 LSB/g||200 Hz|
|Gyroscope||MPU9250||1000 dps||32.8 LSB/dps||200 Hz|
|Magnetometer||HMC5883||1.3 Gs||1090 LSB/Gs||75 Hz|
|Taken from datasheets of MPU9250 and HMC5883. g: standard gravity; dps: degree per second; Gs: gauss; LSB: least significant bit|
Two drone systems are tested in the real field and the results are further compared. The first system is the drone system with a verified SPI bus driver as explained in the previous section. The second one is a system with unverified SPI bus driver. Both of these two systems are equipped with the verified I2C bus driver.
Ten trials have been carried out with different bugs randomly occurring in the SPI bus driver. We record and compare attitudes of the drone (roll, pitch and yaw) since they are the most critical metrics to its safety. The attitudes are computed by the same gradient descent method(Madgwick et al., 2011) using IMU data read from the SPI bus.
4.2 Results and discussions
Fig. 9 shows the roll angles of the unverified drone system. Solid lines in subfigure ‘Roll Angle’ represent the computed actual roll angles while dashed lines represent the desired values required by the remote controller. The differences between the actual and desired value (errors) are shown in subfigure ‘Roll Angle Error’. Three peaks of errors are observed in the timeline 8.6s, 16.8s and 28.5s. At these time intervals, software bugs in the device drivers cause delayed process and response of sensor data, which further blocks the controller’s execution for the next multiple control periods. Software bugs are also detected at 1.8s and 23.2s in the timeline (subfigure ‘SPI Bus Bug’). However, these bugs have no obvious impact on the roll angle, due to the relatively steady attitude of the drone. When these bugs occur, the input of each motor will be the same as it in the previous period. If the current attitude of the drone doesn’t change a lot comparing with the previous one, the drone will stay stable by using the same motor input. The same phenomenon exists on the pitch angle as shown in Fig. 10.
Fig. 11 shows the value of the yaw angle, which does not experience the same variation upon software faults caused by bugs. It is attributed to the sensor fusion algorithm, which uses data from both the IMU (connected with the SPI bus) and the magnetometer (connected to the I2C bus) to improve the accuracy of estimated yaw angle.
Fig. 12 shows comparison of attitude errors between these two drone systems. The exists of software bugs leads to significant differences between desired and actual pitch and roll angle.
Fig. 13 shows a series of snapshots for different drone flights in a consequent timeline. Drones in the first two rows have installed verified device drivers. It could hover, and is able to change its attitude and fly forward. The third row shows the situation when there are bugs in the drone’s SPI bus driver. The pictures show greater variations of the drone’s attitude compared to the first and second rows, even if they are operated in the same manner. This demonstrates that bugs in the SPI bus driver indeed degrades the stability of a drone.
A new software architecture and development method targeting at safety and reliability for a drone system is proposed in this paper. With the help of formal verification, several bus drivers which play critical roles in the flight control are formally verified. Experiments in the filed tests show that the proposed system enjoys the improved reliability by eliminating the subtle bugs that could be introduced in the software development.
In our future work, we plan to extend the proposed architecture with virtualization support. A hypervisor could be introduced to support third-part systems without compromising the inherited safety and security by enforcing strong isolation and non-interference properties.
Barry R, 2003.
The freertos kernel, .
Bohrer et al. (2018)
Bohrer B, Tan YK, Mitsch S, et al., 2018.
Veriphy: Verified controller executables from verified cyber-physical
SIGPLAN Not, 53(4):617-630.
Chen et al. (2016)
Chen H, Wu XN, Shao Z, et al., 2016.
Toward compositional verification of interruptible os kernels and
Proceedings of the 37th ACM SIGPLAN Conference on Programming
Language Design and Implementation, New York, NY, USA, p.431-447.
Gu et al. (2015)
Gu R, Koenig J, Ramananandro T, et al., 2015.
Deep specifications and certified abstraction layers.
Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, New York, NY, USA, p.595-608.
Lee et al. (2010)
Lee T, Leok M, McClamroch NH, 2010.
Geometric tracking control of a quadrotor uav on se(3).
49th IEEE Conference on Decision and Control (CDC), p.5420-5425.
- Leishman (2002) Leishman J, 2002. Principles of Helicopter Aerodynamics. Cambridge University Press.
Leroy X, 2009.
Formal verification of a realistic compiler.
Commun ACM, 52(7):107-115.
Madgwick et al. (2011)
Madgwick SOH, Harrison AJL, Vaidyanathan R, 2011.
Estimation of imu and marg orientation using a gradient descent
2011 IEEE International Conference on Rehabilitation Robotics, p.1-7.
Malecha et al. (2016)
Malecha G, Ricketts D, Alvarez MM, et al., 2016.
Towards foundational verification of cyber-physical systems.
2016 Science of Security for Cyber-Physical Systems Workshop
Marina et al. (2012)
de Marina HG, Pereda FJ, Giron-Sierra JM, et al., 2012.
Uav attitude estimation using unscented kalman filter and triad.
IEEE Transactions on Industrial Electronics, 59(11):4465-4474.
Nutt G, 2007.
Nuttx real-time operating system, .
Réti et al. (2013)
Réti I, Lukátsi M, Vanek B, et al., 2013.
Smart mini actuators for safety critical unmanned aerial vehicles.
2013 Conference on Control and Fault-Tolerant Systems (SysTol),
Ricketts et al. (2015)
Ricketts D, Malecha G, Alvarez MM, et al., 2015.
Towards verification of hybrid systems in a foundational proof
2015 ACM/IEEE International Conference on Formal Methods and Models
for Codesign (MEMOCODE), p.248-257.
Simpson and Stoker (2006)
Simpson A, Stoker J, 2006.
Safety challenges in flying uavs (unmanned aerial vehicles) in non
IET Conference Proceedings, :81-88(7).
Wang KC, 2017.
Embedded Real-Time Operating Systems.
In: Embedded and Real-Time Operating Systems.
Springer International Publishing, Cham.