Parameter Identification for Multirobot Systems Using Optimization-based Controllers

# Parameter Identification for Multirobot Systems Using Optimization-based Controllers

## Abstract

This paper considers the problem of parameter identification for a multirobot system. We wish to understand when is it feasible for an adversarial observer to reverse-engineer the parameters of tasks being performed by a team of robots by simply observing their positions. We address this question by using the concept of persistency of excitation from system identification. Each robot in the team uses optimization-based controllers for mediating between task satisfaction and collision avoidance. These controllers exhibit an implicit dependence on the task’s parameters which poses a hurdle for deriving necessary conditions for parameter identification, since such conditions usually require an explicit relation. We address this bottleneck by using duality theory and SVD of active collision avoidance constraints and derive an explicit relation between each robot’s task parameters and its control inputs. This allows us to derive the main necessary conditions for successful identification which agree with our intuition. We demonstrate the importance of these conditions through numerical simulations by using (a) an adaptive observer and (b) an unscented Kalman filter for goal estimation in various geometric settings. These simulations show that under circumstances where parameter inference is supposed to be infeasible per our conditions, both these estimators fail and likewise when it is feasible, both converge to the true parameters. Videos of these results are available at https://bit.ly/3kQYj5J

## 1 Introduction

There has been significant research on task-based motion planning and control synthesis for multiple robots, for applications involving search and rescue [11], sensor coverage [6] and environmental exploration [4]. Global behaviors result from executing local controllers on individual robots interacting with their neighbors [14], [15]. The inverse problem for task based control is task inference [5], which is the subject of this paper. Our focus is on multirobot task identification by an adversary observing robots performing some tasks. Specifically, we want to understand how easy it is for the observer to infer parameters of the tasks being performed by the robots, using just their positions. Such inference can provide an avenue to the adversary to impede task execution.

Consequently, from the perspective of robots, the question is how can they coordinate their motions so as to obfuscate their parameters. On the other hand, for the observer, the question is when is this model-fitting problem well-posed i.e. when are the positions of the robots “rich enough” so that they suffice to reveal parameters of their underlying tasks. In this paper, we take the view of the observer monitoring a multirobot system in which each robot is tasked with reaching a goal position while avoiding collisions with other robots. The robots use optimization-based controllers which have demonstrated great success for the goal stabilization and avoidance control problem that we are considering [17, 8]. The observer’s problem is to estimate goal locations and controller gains of all robots using their positions over some finite time. We use goal and gain estimation as an example to illustrate the mechanics of our approach; it can be easily adapted to identify parameters of a different task as well.

The development of provably convergent parameter estimation algorithms for nonlinear systems has been studied extensively [12],[2],[9]. These algorithms have been used for identifying parameters of manipulators [20] and quadrotors [21]. We consider these robots “monolithic” since they do not interact with other agents during the parameter identification phase. Moreover, these algorithms require explicit relations between plant parameters and dynamics to derive conditions under which these estimators converge. While deriving such identifiability conditions is our incentive as well, the plant that we focus on has some unique features. It is composed of (a) mutually interacting robots; and (b) each robot uses optimization-based controllers. Interactions among robots pose a challenge to identification because at any given time, the dynamics of an ego robot are governed by the set of robots currently neighboring the ego which is time-varying. Secondly, having an optimization generate the control input for each robot causes the robot’s dynamics to depend implicitly on task parameters through the optimization’s objective, which prohibits straightforward application of the identifiability conditions derived in these works.

To address these challenges, we first provide some background on parameter identification in section 2. Using persistency of excitation [2, 10], we derive a new necessary condition for successful identification in lemma 1. In section 3, we review multirobot avoidance control using optimization-based controllers and formalize the identification problem for this system in section 4. The main contributions begin from section 5 where we derive the KKT conditions of the control-synthesis optimization. By focusing on the set of active interactions (i.e. active constraints) of a robot with other robots, we pose an equality-constrained optimization (EQP) which is the first step for deriving a relation between parameters and the control. In section 6, we classify each robot’s dynamics based on the number of constraints in this EQP, and linear independence relations amongst these constraints. Taking the SVD of these constraints allows us to derive the explicit relation between the parameters and dynamics of each robot that we wanted. Finally, using these relations in conjunction with the persistency of excitation requirement and the result derived in lemma 1, we provide the main neceessary conditions for successful task identification of the multirobot system (theorems (2-6)). The message that these theorems convey is that as the number of robots that an ego robot interacts with increases, estimation of ego’s parameters becomes difficult. This confirms our intuition, because with more interactions, the ego robot’s motion (that the observer measures) is expended in satisfying collision avoidance constraints which it achieves by sacrificing task performance. We demonstrate this numerically in section 7, where we use an adaptive observer and a UKF to infer robots’ goals using their positions under various geometric settings. We conclude in 8 by summarizing and provide directions for future work.

## 2 Observer based Parameter Identification

While there exist several parameter estimation algorithms that the observer can potentially leverage (such as RLS [7], UKF [16]), we focus on observer based methods borrowing ideas from [1, 2, 12] because they provide conditions under which exponential convergence (and even finite-time convergence) to the true parameters is achieved. Consider a nonlinear parameter-affine system as follows

 ˙x=G(x)θ+f(x), (1)

where is the measurable state, is the unknown parameter and , are known functions. In our context, will correspond to the position of the ego robot under observation and denotes the task parameters of its controller that we wish to infer. We assume that the observer runs several parallel estimators synchronously, one for estimating the parameters of each robot, so the focus here is on the ego robot. The observer’s problem is to design an estimation law that guarantees convergence of by using over some where is large enough. Consider a state predictor defined analogously to (1)

 ˙^x=G(x)θ0+f(x)+kw(x−^x), ^x(0)=x(0), (2)

where is a nominal initial estimate of and . Define an auxillary variable as follows

 η=x−^x−W(θ−θ0) (3)

where is generated according to

 ˙W=−kwW+G(x), W(0)=0. (4)

Here is a low-pass filtered version of . While as defined in (3) is not measurable because it depends on that is unknown, defining as in (4) lets us generate using

 ˙η=−kwη, η(0)=x(0)−^x(0). (5)

Based on (1)-(5), let and be generated according to the following dynamics

 ˙Q =WTW, Q(0)=0p×p, (6) ˙C =WT(Wθ0+x−^x−η),C(0)=0p×1, (7)

and let be the time at which , then the following parameter update law

 ˙^θ=Γ(C−Q^θ), ^θ(0)=θ0, (8)

for guarantees that is non-increasing for and exponentially converges to 0 for . Thus, as long as there exists at which , convergence of the estimate to the true parameter is guaranteed.

###### Definition 1 (Persistency of Excitation [19]).

A bounded, locally square integrable function is said to be persistently exciting (PE) if there exist constants such that

###### Definition 2 (Interval Excitation [20]).

A bounded, locally square integrable function is said to be interval exciting (IE) if there exist constants such that

###### Remark.

The condition that is equivalent to the IE condition presented in Def. (2) for . Indeed by defining , we get that is IE iff since using (6)

Since is a low-pass filtered (4), is IE only when is IE [13]. Therefore, is IE implies existence of such that i.e.

 ∫tc0GT(x(s))G(x(s))ds≽ϵI⟹∃tc∣Q(tc)≻0. (9)

Next, we derive a new necessary condition which is required for to be IE.

###### Lemma 1.

Let denote the null space of , then a necessary condition for to be IE, is that must not be time-invariant.

###### Proof.

We prove this lemma by contradiction. Let and assume that is time-invariant i.e. for some constant non-zero vector v. Let , then we have that

 r =vT(∫T0GT(x(s))G(x(s))ds)v =∫T0(vTGT(x(s))G(x(s))v)ds =0 ∀t>0. (10)

Since we assumed that and was arbitrary,

 r=0 ⟹∫T0GT(x(s))G(x(s))ds⋡ϵI ∀t,ϵ>0 ⟹G(x(t)) is not {IE} .

Since existence of such a v implies is not IE, therefore, for which . ∎

Consequently, failure to obtain positive-definiteness of prevents unique identification of using (8). What is the intuition for this result? Recall from (1) that the dynamics depend on the true parameter affinely through . A time-invariant vector qualitatively represents a pathological parameter that does not influence the dynamics because and by extension, also does not influence the measurements . Said another way, suppose is the true parameter of the system and let denote an arbitrary parameter for some . Then, the following calculation shows that either of these parameters result in the same observed dynamics for any for a given initial condition , because :

 ˙x =G(x)(θ+αv)+f(x) =G(x)θ+αG(x)v+f(x) =G(x)θ+f(x)

As a result, the observed measurements would be identical for either choice of parameters (i.e. or ). Therefore, unique identification of the true solely based on these measurements is not possible.

For our application involving parameter estimation for robots, the high-level task and the resulting dynamics of each robot govern the specific form of this condition. In this paper, the task for each robot is to navigate to a goal location while avoiding collisions with the other robots in the system. We use this task as an example to show the mechanics of our inference approach which can be easily adapted to infer parameters of a different type of task as well. We assume that each robot uses optimization-based controllers to achieve this. Most optimization-based controllers ultimately use a Quadratic Program (QP) to compute the control [18]. As an example, our framwork uses Control Barrier Function based QPs, just to demonstrate the approach, but it generalizes to any other QP-based controller as well.

## 3 Avoidance Control for Multirobot Systems

We refer the reader to [17] for a formulation for CBF-QP for the multirobot goal-stabilization and collision avoidance problem. We present a brief summary here. Let there be a total of robots in the system. From the perspective of an ego robot, the remaining robots are “cooperative obstacles” who share the responsibility of avoiding collisions with the ego robot, while navigating to their own goals. In the following, the focus is on the ego robot. This robot follows single-integrator dynamics i.e.

 ˙x=u, (11)

where is its position and is its velocity (i.e. the control input). Assume at a given time, the other robots are located at positions respectively. The ego robot must reach a goal position while avoiding collisions with . The ego robot can use a nominal proportional controller that guarantees stabilization towards . Here is the controller gain. For collision avoidance, the ego robot must maintain a distance of at-least with i.e. their positions must satisfy where and is a desired safety margin.

To combine the collision avoidance requirement with the goal stabilization objective, the ego robot solves a QP that computes a controller closest to the prescribed control and satisfies collision avoidance constraints as follows:

 u∗=argminu∥u−^u(x)∥2subject toA(x)u≤b(x) (12)

Here , are defined such that the row of is and the element of is :

 aTj(x) \coloneqq−ΔxTj=−(x−xoj)T bj(x) \coloneqqγ2(∥∥Δxj∥∥2−D2s) ∀j∈{1,2,…,M} (13)

The ego robot locally solves this QP at every time step, to determine its optimal control , which ensures collision avoidance while encouraging motion towards the goal .

Aside from its position , the ego robot’s control depends on task parameters defined to be the position of the goal , and the nominal controller’s gain . This is implicitly encoded through the cost function of (12) (recall ). To highlight this dependence, we denote the control as where are the unknown parameters the observer aims to identify. Next, we formulate the problem that the observer seeks to solve.

## 4 Task Identification Problem Formulation

The problem for the observer is to monitor the positions of the ego robot and the “cooperative obstacles” over and infer based on these measurements. Here is assumed to be large enough. The observer will run parallel estimators to identify for each robot in the team. Even though (1) to (8) allow for simultaneous identification of and , we focus on decoupled identification i.e. identify (a) assuming is known () and (b) assuming is known () . This is just to keep the analysis simple, joint identification is a straightforward extension.

To estimate using the algorithm in section 2 ((1)- (8)) and to compute the condition in lemma 1, the observer requires explicit dynamics of the form in (1). That is, it must know and . However, owing to the fact that is optimization-based (12), such explicit relations are not known. In the next section, we derive the KKT conditions of (12) which is the first step to derive these expressions. There are a few assumptions first:

###### Assumption 1.

The observer knows the form of safety constraints in (12) and that the cost function is of the form .

###### Assumption 2.

The observer can measure both the position and velocity (i.e. the control of the ego robot.

Assumption 1 is needed since we are interested in deriving the necessary identifiability conditions and they require knowledge of the dynamics. Assumption 2 is not restrictive in practice because positions are easily measurable and velocities can be obtained through numerical differentiation.

## 5 Analysis using KKT conditions

To analyze the relation between the optimizer of (12) i.e. and parameters , we look at the KKT conditions of this QP. These are necessary and sufficient conditions satisfied by . The Lagrangian for (12) is

 L(u,μ)=∥u−^u∥22+μT(Au−b).

Let be the optimal primal-dual solution to (12). The KKT conditions are [3]:

1. Stationarity: ,

 ⟹u∗θ =^u−12∑j∈{1,⋯,M}μ∗jθaj =^u−12ATμ∗θ. (14)
2. Primal Feasibility

 Au∗θ≤b⟺aTju∗θ≤bj% ∀j∈{1,⋯,M}. (15)
3. Dual Feasibility

 μ∗jθ≥0 ∀j∈{1,2,⋯,M}. (16)
4. Complementary Slackness

 μ∗jθ⋅(aTju∗θ−bj)=0 ∀j∈{1,2,⋯,M}. (17)

We define the set of active and inactive constraints as

 A(u∗θ)\coloneqq{j∈{1,2,⋯,M}∣aTju∗θ=bj}, (18) IA(u∗θ)\coloneqq{j∈{1,2,⋯,M}∣aTju∗θ

The set of active constraints qualitatively represents those other robots that the ego robot “worries” about for collisions. From the perspective of the ego robot, we will simply refer to the “other robots” as obstacles. Let there be a total of active constraints i.e. where . Using (16) and (17), we deduce

 μ∗jθ=0 ∀j∈IA(u∗θ). (20)

Therefore, we can restrict the summation in (1) only to the set of active constraints i.e.

 u∗θ =^u−12∑j∈A(u∗θ)μ∗jθaj =^u−12ATacμacθ. (21)

where is the matrix formed using the rows of that are indexed by the active set , and likewise . Similarly, let denote the vector formed from the elements of indexed by . By deleting all inactive constraints and retaining only the active constraints, we can pose another QP that consists only of active constraints, whose solution is the same as that of (12). This equality-constrained program (EQP) is given by

 u∗=argminu∥u−^u(x)∥2subject toAac(x)u=bac(x) (22)

Note that the system is always consistent by construction because of (18), as long as a solution to (12) exists. Now why do we care for this EQP? That is because it is easier to derive an expression for (22) than the inequality constrained problem (12). The only question is how to estimate the active set to determine for (22). This can be done as follows. From Assumption 2, recall that the observer can measure both the position and velocity of the ego robot. Using these, the observer can determine by comparing the residuals against a small threshold consistent with (18):

 Aobserver\coloneqq{j∈{1,2,⋯,M}∣|aTj(x)u∗θ−bj(x)|<ϵ}.

Clearly, computing requires the observer to know and , hence the need for Assumption 1. At any rate, for a small enough threshold , it holds true that consistent with (18). This allows the observer to determine the active set. In the next section, we work with (22) to derive an explicit expression for control i.e. for various combinations of and .

## 6 SVD based Analysis of Aac(x)u=bac(x)

The aim of this section is to derive relations between and needed for identifying these parameters. We will show that the dependence of on these parameters banks on . Theorems 2, 3 and 5 roughly state that whenever there is none or one obstacle for the ego robot to actively avoid, the control exhibits a well-defined dependence on these parameters making their inference using the estimation algorithm in section 2 i.e. equations (1)-(8) possible. On the other hand, theorem 6 states that whenever there are too many obstacles active, the robot is consumed by collision avoidance constraints, so “gives up” on optimizing the objective. Therefore, does not depend on making their inference using (1)-(8) impossible.

### 6.1 No active constraints i.e.K=0

When no constraint is active, we have , so from (1) we get . Intuitively this means that the robot does not worry about collisions with any obstacle, so it is free to use itself. We write this in the parameter affine form i.e. and derive conditions under which estimation is possible.

###### Theorem 2.

If , no constraint is active, then the observer can always estimate the goal using , assuming gain is known. Likewise, the observer can always estimate gain assuming goal is known, as long as the robot is not already at its goal.

###### Proof.
1. If the observer wants to estimate the goal i.e. , then defining and gives . For this case, goal estimation is always possible because i.e. is PE (and IE (9)) and has no null space (lemma 1).

2. If the observer wants to estimate thr gain i.e. , defining and gives . Gain estimation is only possible when i.e. when the robot is not at its goal. This is expected because if the robot is already at its goal, then it will stay there forever, so there is no information in its positions about , hence the result.

### 6.2 Exactly one active constraint i.e.K=1

When one constraint is active, there is one obstacle that the ego robot “worries” about for collision. Since there are two degrees of freedom in the control, and one obstacle to avoid, the ego robot can avoid this obstacle and additionally minimize with the remaining degree of freedom. This causes to exhibit a well-defined dependence on and by extension, on parameters . This makes their inference using the estimation algorithm in section 2 i.e. equations (1)-(8) feasible.

Let denote the index of the active constraint, meaning that it is the obstacle located at that should be “actively” avoided. Thus, from (18), we have where and are defined in (3). Since , from rank-nullity theorem it follows that has a non-trivial null space of dimension one. The null space gives a degree of freedom to the control to minimize while satisfying the constraint. We illustrate this by computing the SVD . Defining

 U(x) \coloneqq1 Σ(x) V(x) \coloneqq[V1,V2]% V1=ai(x)∥ai(x)∥,V2=Rπ2ai(x)∥ai(x)∥. (23)

Since forms a basis for , any can be expressed as

 u ⟹aTiu−bi =UΣm~u1+0⋅~u2−bi=0 (24)

Choosing and , we find that

 u=V1Σ−1mUTbi+V2ψ (25)

satisfies . Recall from the properties of SVD that forms a basis for . We tune to minimize by solving the following unconstrained minimization problem

 ψ∗=argminψ∥u−^u∥2=argminψ∥∥V1Σ−1mUTbi+V2ψ−^u∥∥2, (26)

which gives . Substituting this in (25), gives

 u∗θ=V1Σ−1mUTbi+V2VT2^u. (27)

This equation is the solution to (22) and by extension, to (12). Note that it is the second term that depends on parameters because . Using this relation (27), we are ready to state the conditions under which inference of parameters using the algorithm ( (1)-(8)) and lemma 1 in section 2 is possible.

###### Theorem 3.

If , exactly one constraint is active, then the observer can estimate the goal (gain) using , assuming the gain (goal) is known, as long as the orientation of is not time-invariant and

###### Proof.
1. If the observer wants to estimate the goal i.e. , then define and using (27) so that . The IE condition (Def. 2, (9)) requires that . The situation when positive-definiteness is not attained is when is time-invariant which follows from Lemma 1. Note that which follows from the properties of SVD. Since is always a unit vector, it can only change through its orientation. If its orientation does not change over , then is a time-invariant vector in and hence goal estimation will not be possible, using Lemma 1.

2. If , then and so that . If then . So if then the IE condition (Def. 2) for gain identification will not be satisfied.

The video at https://youtu.be/WoUSej79ZGM shows an example where invariance of the orientation of null-space results in failure to identify goal using the estimation algorithm in 2 (1)-(8) and also a UKF.

### 6.3 2≤K≤M and rank(Aac(x))=1

Now we consider the more general case in which there is more than one constraint active, but all of these are linearly dependent on one constraint among them. This means that effectively there is only one “representative constraint” or obstacle for the ego robot to worry about. Consequently, this case is similar to the case with just one active obtacle. We formally demontrate this now. Let be the indices of active constraints which satisfy

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣aTi1(x)aTi2(x)⋮aTiK(x)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦u∗θ =⎡⎢ ⎢ ⎢ ⎢ ⎢⎣bi1(x)bi2(x)⋮biK(x)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦ or Aac(x)u∗θ =bac(x), (28)

where are defined using (3). Since , WLOG we have where . Let’s first see the geometric arrangements of the robot and the obstacles when this case arises in practice.

###### Lemma 4.

The case with more than one constraint active and all linearly dependent can only arise in practice for . means that obstacles indexed and are coinciding and . means that the robot is located exactly in the middle of obstacles and . Furthermore, even if there is just one for which , then .

###### Proof.

Since are active constraints

 aTi1(x)u∗θ =bi1(x) (29) aTij(x)u∗θ =bij(x) ∀j∈{2,3,⋯,K} (30)

Substituting in (29), we get

 λaTij(x)u∗θ =bi1(x) ⟹λbij(x) =bi1(x) ∀j∈{2,3,⋯,K} (31)

Recalling that from (3), we get

 ⟹ λ(∥∥aij∥∥2−D2s)=(∥∥λaij∥∥2−D2s) ⟹ λ2∥∥aij∥∥2−λ(∥∥aij∥∥2−D2s)−D2s=0 (32)

This equation has two roots .

1. and i.e. or . This means obstacle is coinciding with obstacle . This is a trivial yet an expected result.

2. implies that are anti-parallel. However, when , . Recalling the definition of , we know that and therefore . This means that if the robot is strictly safe with respect to obstacle , then it is colliding with obstacle . This means that the control at the previous time step caused this collision which is not possible. This conflict can only be resolved when we relax strict safety to meaning that the robot and obstacle are touching each other. This gives implying that the robot and obstacle are also touching each other. In this case which implies or . Furthermore, even if there is one for which , then from (6.3),

Next, we derive an analytical expression for using (6.3). Note where

 U Σ \coloneqq[Σr01×10K−1×10K−1×1] , Σr=√1+ΣKj=2λ2j∥