Fault Tolerant Control for Networked Mobile Robots
Abstract
Teams of networked autonomous agents have been used in a number of applications, such as mobile sensor networks and intelligent transportation systems. However, in such systems, the effect of faults and errors in one or more of the subsystems can easily spread throughout the network, quickly degrading the performance of the entire system. In consensusdriven dynamics, the effects of faults are particularly relevant because of the presence of unconstrained rigid modes in the transfer function of the system. Here, we propose a twostage technique for the identification and accommodation of a biasedmeasurements agent, in a network of mobile robots with time invariant interaction topology. We assume these interactions to take place only in the form of relative position measurements. A fault identification filter deployed on a single observer agent is used to estimate a single fault occurring anywhere in the network. Once the fault is detected, an optimal leaderbased accommodation strategy is initiated. Results are presented by means of numerical simulations and robot experiments.
I Introduction
Cooperation in multiagent systems can lead to highly coordinated behaviors even when individual agents in the network have limited skills and information. E.g., a team of cooperating robots can efficiently adapt to changes in the environment and perform sequential tasks, with potentially limited supervision. The degree of interaction depends on the sensor and communication architectures, and can be conviniently modeled using elements from graph theory [1], [2].
As an example, consensus theory constitues a fundamental tool for the design of control and estimation protocols in multiagent systems [3]. Despite the efficiency and versatility of consensusbased algorithms, they are vulnerable to drift, and therefore, measurements errors and hardware faults can introduce disturbances which are difficult to correct [4].
This paper addresses this problem in the context of Fault Tollerant Control Systems (FTCS). The study of FTCS addresses the design of active control systems capable of automatically detecting a fault and performing the actions required in order to maintain acceptable performances [5]. FTCS design inherently involves a multistage process: first, a Fault Detection and Identification (FDI) system must provide precise information about the fault, then during the Fault Accommodation (FA) (or mitigation) stage, an appropriate control compensates for the fault.
In this paper we describe a Fault Identification Filter (FIF) to be deployed on multiagent robotic systems, performing linear agreement (consensus) and formation control protocols. The proposed technique relies on a bank of linear observers deployed on board of an observer agent in charge of detecting a single, timeinvariant fault occurring in any of the nodes of the network, including itself. Once the fault is detected, a leader, not necessarily corresponding to the observer, compensates for it. We consider the underlying graph representing the interaction between the agents to be fixed in time and completely controllable with respect to the leader node; we also assume robots interactions to occur as relative distance measurements only.
Ia Related Work
Many different approaches have been proposd for the design of fault tollerant distributed Netowrk Control Systems (NCS) (see for example [6] and references therein). In the context of distributed networked agents equipped with sensors, both noise and faults make measurements unreliable, which has been approached using distributed Kalman filtering [7], Bayasian [8] and DempsterShefer frameworks [9]. When the states of the agents directly depends on neighbors’ relative states, the ability to restore the nominal prefault performance, becomes more complex, and depends on the underlying interaction graph between the nodes. To this end, the problem of networked control systems usually targets the selection of valid control nodes and the study of the interaction betwen topology and controllability [10], [11], [12].
In robotics, fault tolerant controls have been investigated in many different contexts, including, for example, flight control systems [13], manipulators [14], and quadrocopters subject to the loss of motors [15]. Resiliance of multiagent robotic systems to errors, faults, and adversarial attacks has also been investigated. In [16], faultresistance is achieved by assuming agents controlled by independent probabilistic processes. However, when agents’ controllers depend on neighbors’ state, the spreading of faults and errors becomes difficult to control. In [17] a filter composed by a bank of linear observers, where each observer can detect faults in one of its nearest neighbors is porposed. After fault occurrence, the faulty node is assumed to be removed from the network. Heterogeneous multiagent systems are considered in [18], where a LMIbased solution to the distributed FDI problem is proposed. Multiple and simultaneous faults occurring in each agent and its nearest neighbors were detected considering environmental noise and disturbances. In [19], misbehaving nodes are studied in the context of linear consensus dynamics, where both genuine random faults and malicious messages are considered. The fault identification was then studied in terms of the connectivity properties of the network.
The formulation used in this paper builds on the techniques described in [20] and [21], where a linear observer is designed such that the filter residuals posses some desired directional properties. The original idea was extended to linear networked control systems with communication delays in [22] and to discretetime switched linear systems in [23].
In Section II we describe the dynamics of the robots and the fault detection filter. In Section III we discuss the leader optimal FA strategy, presenting numerical simulations for a multiagent robotic system. In Section IV we extend the results to a formation control protocol, which was tested on real robots. Final remarks are reported in Section V.
Ii Fault Identification Filter Under Pure Consensus Dynamics
Iia System Dynamics and Fault Modeling
Consider a collection of mobile robots, located in a planar connected, and compact domain . Let , for , denote the agent’s position. We assume each agent to interact with a nonempty set of other robots. Generally, interagent interactions can be conveniently described by an undirected graph , where is the set of nodes representing the agents, and is an unordered list of node pairs corresponding to interacting robots. We assume undirected networks, i.e. and we let be connected and constant at all times.
The neighborhood set of agent , denoted by , , is the set of all vertexes connected to node . Also, is the degree of vertex , defined as the number of vertexes connected to node , i.e. .
To start the development, we initially consider a team of robots performing a consensus protocol, which represents a general starting point for other more complex possible behaviors. Assuming a discrete time model, with positive step size , and temporally indexed by , the update equation for agent is
(1) 
The behavior emerging from the dynamics in (1) is a pure consensus dynamics, leading to the rendezvous of the agents at the centroid of their initial configuration (see for example [2] and references therein).
We write the complete state of the system in the compact form , where . It is possible to represent the complete update equation as:
(2) 
where, , is the Kronecker product, the identity matrix of size , and is the Laplacian of .
Now, assume that at time step , an apriori unknown agent, indexed by experiences a fault. Without loss of generality, we model this fault as an exogenous velocity input acting on the system; therefore, including the fault in the state update equation (2) leads to the switched system:
(3)  
(4) 
where is the fault distribution matrix corresponding to the unknown faulty agent, and defined as:
where is the canonical vector of appropriate size.
A possible interpretation for the fault can be given as follows. The interaction between the agents occurs in the form of relative distance measurements; therefore, the measurements for agent , , , are:
(5) 
where the indexes indicate the neighbors of agent and is the corresponding measure matrix. The state update equation can then be written in terms of the measurements as:
(6) 
where is a vector having all its elements equal to . At time step , agent experiences a fault in one or more of its sensors. Denoting by the degree of node , we assume each of the postfault measurements to be biased by a nonhomogeneous term , with . Then, the vector of measurements for the faulty agent is:
(7) 
where , is the compact representation of all faults corresponding to each neighbor in . By applying (7) to (6), by inspection we note that the fault acting on the system in (3) corresponds in this case to:
IiB Fault Observer Design
We now turn our discussion to the design of the filters deployed on the one agent of the team in charge of detecting and estimating a fault in any robot of the team, including itself. We refer to this agent as the observer agent, and we denote quantities relative to it with a subscript ; e.g. represents the observer agent’s measurements matrix.
Remark II.1
The purpose of the observer is to:

find the index corresponding to the faulty agent;

estimate the fault vector ;

find the time of fault occurrence .
With reference to Fig. 1, consider a bank of filters, each designed to detect a fault in a specific agent of the team. Note that by defining filters, we allow the observer to be the faulty agent and being capable of detecting itself as the faulty agent. The common input for each filter in the bank is the observer measure , while each filter outputs a set of signals called residuals. When residuals are sensitive to only a single fault the design belongs to the category of Fault Isolation Filter (FIF). We denote quantities associated with each filter of the bank by the subscript .
Assuming a linear state observer for the dynamics in (2), the update equation for the state estimate in the filter is:
(8) 
Following the approach described in [21], we derive a formulation for the gain such that the filter outputs have some desired directional properties. To this end, the estimation error in filter at time is , and under the effect of the fault we have:
(9) 
where is the residual relative to the filter . In a similar way, denoting with and filter’s estimate error and the residual for the system not subject to the fault respectively, we have:
(10) 
We now introduce the concept of fault detectability index required for the definition of the FIF.
Definition II.2
If is connected, the system has finite fault detectability index. Before proving this result we introduce the following lemma, by slightly reformulating the result in [24].
Lemma II.3
If is connected and , for , then the matrix in (2) is stochastic.
The matrix is row stochastic if its entries satisfy , for all and , for all . From the definition of the Laplacian of , , for all . Then, since , we note that , for all .
Again, by definition of the graph Laplacian, we have (where since is connected), if , and otherwise. Then, since the only nonnull elements of are , for , and , for , is stochastic if conditions:
(12) 
are satisfied. Finally, since and , both (12) hold if and only if .
For the graph , the geodesic distance between a pair of nodes is the length of the shortest path connecting them. We introduce the geodesic function and is the geodesic distance between nodes and .
Theorem II.4
Under the hypothesis of Lemma II.3, the fault detectability index corresponds to the geodesic distance, i.e. .
Consider a discrete time random walk on governed by the stochastic transition matrix . Denoting with the initial probability distribution of two independent processes over , we note that represents the probability distribution of the walks at time step . Then, by definition of , the process described by , can also be interpreted as the probability distributions of two identical walks at step , both started at node .
After steps, the probability that the walk (both walks are identical) reached node is zero, and therefore . Similarly, after the same number of steps, it is easy to verify that:
(13) 
where is the index of the walks. From the definition of the measurements matrix in (5), for walk we write:
where, similarly to (5), indexes correspond to the neighbors of agent . Finally, since and (13) we conclude that:
(14) 
and from Definition II.2, it follows that .
In other words, the fault detectability index can be viewed as the number of steps required for the fault to affect an observer’s neighbor and therefore, being visible to the observer itself. To this end, fault detectability in a network can be studied similarly to its controllability [25].
Corollary II.5
Under the hypothesis of Lemma II.3, for every choice of observer and faulty agent, a finite fault detectability index always exists.
For all connected graphs there exists a finite geodesic distance between each pair of nodes. Thus, this result directly follows from Theorem II.4.
Definition II.6
The fault detectability matrix for the filter , namely , is defined as , where .
The following is a main result from [21].
Theorem II.7
Assume the following parametrization:
(15) 
with , ^{1}^{1}1 is the pseudoinverse (or MoorePenrose inverse) of the matrix , , where , and is an arbitrarily matrix chosen such that the matrix has full row rank; then, the following constraint is always satisfied:
and we can write the residual at time as:
(16) 
We refer the reader to [21] for the details of the proof. Thanks to the particular parametrization introduced in Theorem II.7, for the system affected by the fault, the residual in (16) is given by the sum of two terms. The first term is the residual of the faultfree system (10), while the second term depends on the fault vector delayed by a quantity dependent on the fault detectability index .
Following the results from Theorem II.7, substituting (15) in (8) leads to the following final expression for the fault identification filter :
(17) 
where:
(18)  
(19) 
Finally, substituting (16) in (18) and (19) leads to:
where we used that and .
The last two equations verify the desired directional properties for the output residuals, and . In fact, we first note that is decoupled from the fault and its convergence to zero is guaranteed by the stability properties of the faultfree filter (10) even under a nonzero error initial state estimate [21]. Moreover, since is independent from the fault, we have . Finally, as approaches zero, converges to the fault .
The filter (17)(19) is replicated on board of the observer agent times, providing the values of and for , at all time steps . In order to guarantee the correct detection of the fault occurring on the system, three conditions must be satisfied. First, by denoting with the Euclidean norm of the faultfree residual, trustworthiness of the filter is verified when , where is a small positive tolerance. In addition, by denoting with , with , two positive fault detection thresholds, uniqueness of the fault is guaranteed when there exists only one residual above the threshold , while all other residuals are below the threshold , i.e.:
(20) 
We refer to the condition in (20) as fault detection condition.
So far nothing has been said about the choice of . The directional properties of the residuals are not affected by the particular choices of , however it represents an additional degree of freedom in the FIF design. In [21] minimize the trace of the estimation error covariance matrix. In [22], was designed with respect to the unknown disturbance.
IiC Numerical Simulations for the Consensus Dynamics
We apply the results of the FIF introduced to a team of 9 mobile robots, with interaction topology as in Fig. 2.
Starting from random initial positions at time , with , the robots run the consensus dynamics (1). Here, agent acts as the observer agent, and agent experiences a fault , at .
In Fig. 3 we note the two residuals (top figure) and (bottom figure) over time. We observe the components of the fault being correctly estimated in their magnitude (top) and the state estimation error approaching zero from their initial nonzero error (bottom). This confirms the convergence of the state estimate to the real state.
Iii Optimal Fault Accommodation
In the previous section, we discussed the design of the filters bank used by the observer in order to detect a fault occurring in a node of the network. A generic fault was modeled as an exogenous disturbance introduced in the system. In this section, we turn our attention to the leader’s accommodating input. In particular, we present an optimal accommodation strategy to be employed by the leader in order to control the robots’ centroid, i.e. move the centroid to a predefined recovery position or maintain it to its prefault position. Without loss of generality, denoting with the centroid of the robots at time step , where , and with the desired final position for , leader’s objective is to provide the control required such that, under the effect of the fault,
(21) 
Note that, since the only information required by the leader are the fault vector , the faulty agent index , and the state estimate , it is reasonable to assume the leader coinciding with the observer agent. However, if this information can be communicated, this is not required to be necessarily the case.
Iiia Control of the Fault and Leader Estimation Filter
Given the discrete nature of a fault occurring on the system at time step , the controlled dynamics can be represented by the following switched controlled consensus:
(22)  
where is the leader accommodation control at time . Similarly to what was discussed in the previous section, using (22), the complete postfault dynamics is:
(23) 
where and is the control matrix defined by the choice of the leader agent, i.e. . From the postfault dynamics (23), we can define the leader state estimation filter by introducing the control in the filter dynamics (8). Therefore, letting be the state estimate for the leader agent, we have:
(24) 
In order for the leader to achieve the accommodation objective in (21), the position of the system’s centroid is required. If the state of the system at the initial time is known, from the invariance of under the consensus dynamics, for all :
(25) 
where the definition of is clear by inspection of (25).
Conversely, if the leader does not know the state of the system at the initial time, the position of robots’ centroid is also unknown. However, from (24) we know that the leader’s state estimate also convergences to the centroid of its initial value. Assuming the leader measures its own position , it is possible to correct the system state estimate by the difference between leader’s own estimated position, namely , and its measured one. Thus, the position of the centroid at the time of the fault is:
IiiB Optimal Accommodation Control
We compute the accommodation control by solving a closedform receding horizon optimal control problem, where we assumed the system to be completely controllable via the agent [26]. At each time step , the solution of the optimal control problem provides a sequence of control inputs , with being the length of the prediction horizon. The cost to be minimized by the leader is:
(26) 
subject to the system dynamics and the desired :
(27)  
(28) 
At the end of the horizon, under the control sequence :
which we rearrange as:
(29) 
where .
IiiC Results
The optimal accommodation strategy is applied to the multiagent system used in Section IIC. A fault is applied to agent at time step . Once the leader, agent , detects the fault, it initiates the accommodation maneuver. The top of Fig. 5 shows the norm of the centroid position with and without fault accommodation (top), and leader’s input components (bottom). As a result of the accommodation strategy, the centroid position remains practically unchanged.
Iv Fault Identification and Accommodation Under Formation Control
In this section, we extend the dynamics considered previously to more general scenarios. In particular, we assume that the team of robots run a consensusbased formation control protocol. Denoting by , with , the desired relative displacement between pairs of neighboring robots, we encode the desired formation in the vector , where .
Adding to the update equation in (2), we write the formation control protocol as , and the dynamics of the system subject to the fault follow:
(31)  
(32) 
where all quantities are the same as in Section II.
Similarly, it is possible to rewrite the linear filter in (8) for the formation control problem as
(33) 
Since the formation term does not depend on the state of the system, it can be easily shown that given (33), both (9) and (10) remain unchanged, and consequently, the same estimation gain matrix computed in (15) still guarantees the desired direction properties for the residuals and . By substituting the dynamics of the filters with (31), observer applies the same fault detection condition defined in (20).
Finally, adding the formation term to the controlled dynamics (23)
(34) 
the final constraint can be rewritten similarly to (29) as:
(35) 
where now .
Iva Robot Experiments
Experiments have been performed on the remotely accessible Robotarium [27] platform, with a team of 9 agents performing a formation control protocol. At time step a fault vector . In this case we assume a recovery position for the centroid, denoted with a black ring in Fig. 6 (pictures are taken from an overhead camera). After the fault is detected, the leader compensates for the presence of the faults, and drives the centroid of team (represented by a the black X) to the desired recovery point. In Fig.7 we observe the norm of the centroid moving from the prefault position to the desired postfault value.
V Conclusion
Consensusbased protocols in multiagents systems are highly vulnerable to exogenous disturbances, such as faults. In this paper, a fault identification and accommodation strategy for a static networked multiagent robotic system is proposed. Under a linear agreement and formation control, the proposed filter individuates a faulty agent anywhere in the network and estimates the entity of the disturbance introduced. After the fault is detected, an optimal accommodation strategy is employed by a leader in order to control the robots’ centroid, and move it to an arbitrary position.
References
 [1] R. OlfatiSaber, “Flocking for multiagent dynamic systems: Algorithms and theory,” IEEE Transactions on automatic control, vol. 51, no. 3, pp. 401–420, 2006.
 [2] M. Mesbahi and M. Egerstedt, Graph theoretic methods in multiagent networks. Princeton University Press, 2010.
 [3] F. Garin and L. Schenato, “A survey on distributed estimation and control applications using linear consensus algorithms,” in Networked Control Systems. Springer, 2010, pp. 75–107.
 [4] L. Xiao, S. Boyd, and S.J. Kim, “Distributed average consensus with leastmeansquare deviation,” Journal of Parallel and Distributed Computing, vol. 67, no. 1, pp. 33–46, 2007.
 [5] Y. Zhang and J. Jiang, “Bibliographical review on reconfigurable faulttolerant control systems,” Annual reviews in control, vol. 32, no. 2, pp. 229–252, 2008.
 [6] R. J. Patton, C. Kambhampati, A. Casavola, P. Zhang, S. Ding, and D. Sauter, “A generic strategy for faulttolerance in control systems distributed over a network,” European journal of control, vol. 13, no. 23, pp. 280–296, 2007.
 [7] R. OlfatiSaber, “Distributed kalman filtering for sensor networks,” in Decision and Control, 2007 46th IEEE Conference on. IEEE, 2007, pp. 5492–5498.
 [8] X. Luo, M. Dong, and Y. Huang, “On distributed faulttolerant detection in wireless sensor networks,” IEEE Transactions on Computers, vol. 55, no. 1, pp. 58–70, 2006.
 [9] K. Premaratne, M. N. Murthi, J. Zhang, M. Scheutz, and P. H. Bauer, “A dempstershafer theoretic conditional approach to evidence updating for fusion of hard and soft data,” in Information Fusion, 2009. FUSIon’09. 12th International Conference on. IEEE, 2009, pp. 2122–2129.
 [10] Y.Y. Liu, J.J. Slotine, and A.L. Barabási, “Controllability of complex networks,” Nature, vol. 473, no. 7346, pp. 167–173, 2011.
 [11] F. Pasqualetti, S. Zampieri, and F. Bullo, “Controllability metrics, limitations and algorithms for complex networks,” IEEE Transactions on Control of Network Systems, vol. 1, no. 1, pp. 40–52, 2014.
 [12] A. Chapman and M. Mesbahi, “Semiautonomous consensus: network measures and adaptive trees,” IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 19–31, 2013.
 [13] M. Steinberg, “Historical overview of research in reconfigurable flight control,” Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, vol. 219, no. 4, pp. 263–275, 2005.
 [14] M. L. Visinsky, J. R. Cavallaro, and I. D. Walker, “A dynamic fault tolerance framework for remote robots,” IEEE Transactions on Robotics and Automation, vol. 11, no. 4, pp. 477–490, 1995.
 [15] M. W. Mueller and R. D’Andrea, “Stability and control of a quadrocopter despite the complete loss of one, two, or three propellers,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE, 2014, pp. 45–52.
 [16] S. Bandyopadhyay, S.J. Chung, and F. Y. Hadaegh, “Probabilistic and distributed control of a largescale swarm of autonomous agents,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1103–1123, 2017.
 [17] I. Shames, A. M. Teixeira, H. Sandberg, and K. H. Johansson, “Distributed fault detection for interconnected secondorder systems,” Automatica, vol. 47, no. 12, pp. 2757–2764, 2011.
 [18] M. R. Davoodi, K. Khorasani, H. A. Talebi, and H. R. Momeni, “Distributed fault detection and isolation filter design for a network of heterogeneous multiagent systems,” IEEE Transactions on Control Systems Technology, vol. 22, no. 3, pp. 1061–1069, 2014.
 [19] F. Pasqualetti, A. Bicchi, and F. Bullo, “Consensus computation in unreliable networks: A system theoretic approach,” IEEE Transactions on Automatic Control, vol. 57, no. 1, pp. 90–104, 2012.
 [20] B. Liu and J. Si, “Fault isolation filter design for linear timeinvariant systems,” IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 704–707, 1997.
 [21] J.Y. Keller, “Fault isolation filter design for linear stochastic systems,” Automatica, vol. 35, no. 10, pp. 1701–1706, 1999.
 [22] D. Sauter, S. Li, and C. Aubrun, “Robust fault diagnosis of networked control systems,” International Journal of Adaptive Control and Signal Processing, vol. 23, no. 8, pp. 722–736, 2009.
 [23] M. Rodrigues, D. Theilliol, and D. Sauter, “Fault tolerant control design for switched systems,” IFAC Proceedings Volumes, vol. 39, no. 5, pp. 223–228, 2006.
 [24] R. OlfatiSaber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multiagent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233, 2007.
 [25] A. Yazıcıoğlu, W. Abbas, and M. Egerstedt, “Graph distances and controllability of networks,” IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4125–4130, 2016.
 [26] R. Chipalkatty, G. Droge, and M. B. Egerstedt, “Less is more: Mixedinitiative modelpredictive control with human inputs,” IEEE Transactions on Robotics, vol. 29, no. 3, pp. 695–703, 2013.
 [27] D. Pickem, P. Glotfelter, L. Wang, M. Mote, A. Ames, E. Feron, and M. Egerstedt, “The robotarium: A remotely accessible swarm robotics research testbed,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 1699–1706.