A Notation

# Resilient Monotone Sequential Maximization

## Abstract

Applications in machine learning, optimization, and control require the sequential selection of a few system elements, such as sensors, data, or actuators, to optimize the system performance across multiple time steps. However, in failure-prone and adversarial environments, sensors get attacked, data get deleted, and actuators fail. Thence, traditional sequential design paradigms become insufficient and, in contrast, resilient sequential designs that adapt against system-wide attacks, deletions, or failures become important. In general, resilient sequential design problems are computationally hard. Also, even though they often involve objective functions that are monotone and (possibly) submodular, no scalable approximation algorithms are known for their solution. In this paper, we provide the first scalable algorithm, that achieves the following characteristics: system-wide resiliency, i.e., the algorithm is valid for any number of denial-of-service attacks, deletions, or failures; adaptiveness, i.e., at each time step, the algorithm selects system elements based on the history of inflicted attacks, deletions, or failures; and provable approximation performance, i.e., the algorithm guarantees for monotone objective functions a solution close to the optimal. We quantify the algorithm’s approximation performance using a notion of curvature for monotone (not necessarily submodular) set functions. Finally, we support our theoretical analyses with simulated experiments, by considering a control-aware sensor scheduling scenario, namely, sensing-constrained robot navigation.

\IEEEoverridecommandlockouts

## 1 Introduction

Problems in machine learning, optimization, and control [1, 2, 3, 4, 5, 6, 7, 8, 9] require the design of systems in applications such as:

• Car-congestion prediction: Given a flood of driving data, collected from the drivers’ smart-phones, which few drivers’ data should we process at each time of the day to enable the accurate prediction of car traffic? [1]

• Adversarial-target tracking: At a flying robot, that uses on-board sensors to navigate itself, which few sensors should we activate at each time step both to maximize the robot’s battery life, and to ensure its ability to track targets moving in a cluttered environment? [2]

• Hazardous environmental-monitoring: In a team of mobile robots, which few robots should we choose at each time step as actuators (leaders) to guarantee the team’s capability to monitor the radiation around a nuclear reactor despite intro-robot communication noise? [3]

In particular, all the aforementioned applications [3, 1, 2, 4, 5, 6, 7, 8, 9] motivate the sequential selection of a few system elements, such as sensors, data, or actuators, to optimize the system performance across multiple time steps, subject to a resource constraint, such as limited battery for sensor activation. More formally, each of the above applications motivate the solution to a sequential optimization problem of the form:

 maxA1⊆V1⋯maxAT⊆VTf(A1,…,AT),such that:|At|=αt, for all t=1,…,T\!, (1)

where represents the number of design steps in time; the objective function  is monotone and (possibly) submodular —submodularity is a diminishing returns property;— and the cardinality bound captures a resource constraint at time . The problem in eq. (1) is combinatorial, and, specifically, it is NP-hard [10]; notwithstanding, several approximation algorithms have been proposed for its solution, such as the greedy [11].

But in all the above critical applications, sensors can get cyber-attacked [12]; data can get deleted [13]; and actuators can fail [14]. Hence, in such failure-prone and adversarial scenarios, resilient sequential designs that adapt against denial-of-service attacks, deletions, or failures become important.

In this paper, we formalize for the first time a problem of resilient monotone sequential maximization, that goes beyond the traditional objective of the problem in eq. (1), and guards adaptively against real-time attacks, deletions, or failures. In particular, we introduce the following resilient re-formulation of the problem in eq. (1):

 maxA1⊆V1minB1⊆A1⋯maxAT⊆VTminBT⊆ATf(A1∖B1,…,AT∖BT),such that:|At|=αt and |Bt|≤βt, for all t=1,…,T\!, (2)

where the number represents the number of possible attacks, deletions, or failures —in general, it is . Overall, the problem in eq. (2) maximizes the function  despite real-time worst-case failures that compromise the consecutive maximization steps in eq. (1). Therefore, the problem formulation in eq. (2) is suitable in scenarios where there is no prior on the removal mechanism, as well as, in scenarios where protection against worst-case failures is essential, such as in expensive experiment designs, or missions of adversarial-target tracking.

In more detail, the problem in eq. (2) may be interpreted as a -stage perfect information sequential game between two players [15, Chapter 4], namely, a “maximization” player (designer), and a “minimization” player (attacker), who play sequentially, both observing all past actions of all players, and with the designer starting the game. That is, at each time , both the designer and the attacker adapt their set selections to the history of all the players’ selections so far, and, in particular, the attacker adapts its selection also to the current (-th) selection of the designer (since at each step , the attacker plays after it observes the selection of the ’designer).

In sum, the problem in eq. (2) goes beyond traditional (non-resilient) optimization [16, 17, 18, 19, 20] by proposing resilient optimization; beyond single-step resilient optimization [21, 22, 23] by proposing multi-step (sequential) resilient optimization; beyond memoryless resilient optimization [24] by proposing adaptive resilient optimization; and beyond protection against non-adversarial attacks [13, 25] by proposing protection against worst-case attacks. Hence, the problem in eq. (2) aims to protect the system performance over extended periods of time against real-time denial-of-service attacks or failures, which is vital in critical applications, such as multi-target surveillance with teams of mobile robots [9].

Contributions. In this paper, we make the contributions:

• (Problem definition) We formalize the problem of resilient monotone sequential maximization against denial-of-service removals, per eq. (2). This is the first work to formalize, address, and motivate this problem.

• (Solution) We develop the first algorithm for the problem of resilient monotone sequential maximization in eq. (2), and prove it has the following properties:

• system-wide resiliency: the algorithm is valid for any number of removals;

• adaptiveness: the algorithm adapts the solution to each of the maximization steps in eq. (2) to the history of realized (inflicted) removals;

• minimal running time: the algorithm terminates with the same running time as state-of-the-art algorithms for (non-resilient) set function optimization, per eq. (1);

• provable approximation performance: the algorithm ensures for all , and for objective functions that are monotone and (possibly) submodular —as it holds true in all aforementioned applications [3, 1, 2, 4, 5, 6, 7, 8, 9],— a solution finitely close to the optimal.

To quantify the algorithm’s approximation performance, we used a notion of curvature for monotone (not necessarily submodular) set functions.

• (Simulations) We conduct simulations in a variety of sensor scheduling scenarios for autonomous robot navigation, varying the number of sensor failures. Our simulations validate the benefits of our approach.

Overall, the proposed algorithm herein enables the resilient reformulation and solution of all above applications [3, 1, 2, 4, 5, 6, 7, 8, 9] against worst-case attacks, deletions, or failures, over multiple design steps, and with provable approximation guarantees.

Notation. Calligraphic fonts denote sets (e.g., ). Given a set , then denotes the power set of ; in addition, denotes ’s cardinality; given also a set , then denotes the set of elements in that are not in . Given a ground set , a set function , and an element , the denotes , and the denotes .

## 2 Resilient Monotone Sequential Maximization

We formally define resilient monotone sequential maximization. We start with the basic definition of monotonicity.

###### Definition (Monotonicity)

Consider any finite ground set . The set function is non-decreasing if and only if for any sets , it holds .

We define next the main problem in this paper.

###### Problem

(Resilient Approximately-Submodular Sequential Maximization) Consider the parameters: an integer ; finite ground sets ; a non-decreasing set function such that, without loss of generality, it holds and is non-negative; finally, integers and such that , for all .

The problem of resilient approximately-submodular sequential maximization is to maximize the objective function  through a sequence of  maximization steps, despite compromises to the solutions of each of the maximization steps; in particular, at each maximization step a set of cardinality is selected, and is compromised by a worst-case set removal of cardinality . Formally:

 maxA1⊆V1minB1⊆A1⋯maxAT⊆VTminBT⊆ATf(A1∖B1,…,AT∖BT),such that:|At|=αt and |Bt|≤βt, for all t=1,…,T\!. (3)

As we mentioned in this paper’s Introduction, Problem 1 may be interpreted as a -stage perfect information sequential game between two players [15, Chapter 4], a “maximization” player, and a “minimization” player, who play sequentially, both observing all past actions of all players, and with the “maximization” player starting the game. In the following paragraphs, we describe this game in more detail:

• 1st round of the game in Problem 2: the “maximization” player selects the set ; then, the “minimization” player observes , and selects the set against ;

• 2nd round of the game in Problem 2: the “maximization” player, who already knows , observes , and selects the set , given and ; then, the “minimization” player, who already knows and , observes , and selects the set against , given and .

• -th round of the game in Problem 2: the “maximization” player, who already knows the history of selections , as well as, removals , selects the set , given and ; then, the “minimization” player, who also already knows the history of selections , as well as, removals , observes , and selects the set against , given and .

## 3 Adaptive Algorithm for Problem 2

We present the first algorithm for Problem 2, show it is adaptive, and, finally, describe the intuition behind it. The pseudo-code of the algorithm is described in Algorithm 1.

### 3.1 Intuition behind Algorithm 1

The goal of Problem 2 is to ensure a maximal value for an objective function through a sequence of  maximization steps, despite compromises to the solutions of each of the maximization steps. In particular, at each maximization step , Problem 2 aims to select a set towards a maximal value of , despite that each  is compromised by a worst-case set removal from , resulting to being finally evaluated at the sequence of sets instead of the sequence of sets . In this context, Algorithm 1 aims to fulfil the goal of Problem 2 by constructing each set as the union of the sets , and (line 11 of Algorithm 1), whose role we describe in more detail below:

#### Set St,1 approximates worst-case set removal from At

Algorithm 1 aims with the set to capture the worst-case removal of elements among the elements that Algorithm 1 is going to select in ; equivalently, the set  is aimed to act as a “bait” to an attacker that selects to remove the best  elements from  (best with respect to the elements’ contribution towards the goal of Problem 2). However, the problem of selecting the best  elements in  is a combinatorial and, in general, intractable problem [10]. For this reason, Algorithm 1 aims to approximate the best  elements in , by letting be the set of  elements with the largest marginal contributions in the value of the objective function (lines 5-6 of Algorithm 1).

#### Set St,2 is such that St,1∪St,2 approximates optimal set solution to t-th maximization step of Problem 2

Assuming that is the set of elements that are going to be removed from Algorithm 1’s set selection , Algorithm 1 needs next to select a set of elements to complete the construction of , since it is per Problem 2. In particular, for to be an optimal solution to -th maximization step of Problem 2 (assuming the removal of  from ), Algorithm 1 needs to select as a best set of elements from . Nevertheless, the problem of selecting a best set of elements from is a combinatorial and, in general, intractable problem [10]. As a result, Algorithm 1 aims to approximate such a best set, using the greedy procedure in the lines 7-10 of Algorithm 1.

Overall, Algorithm 1 constructs the sets and to approximate an optimal solution to the -th maximization step of Problem 2 with their union (line 11 of Algorithm 1).

We describe next the steps in Algorithm 1 in more detail.

### 3.2 Description of steps in Algorithm 1

Algorithm 1 executes four steps for each , where is the number of maximization steps in Problem 2:

#### Initialization (line 4 of Algorithm 1)

Algorithm 1 defines two auxiliary sets, namely, the and , and initializes each of them with the empty set (line 4 of Algorithm 1). The purpose of and of is to construct the set , which is the set Algorithm 1 selects as a solution to Problem 2’s -th maximization step; in particular, the union of and of  constructs at the end of the -th execution of the algorithm’s “for loop” (lines 3-12 of Algorithm 1).

#### Construction of set St,1 (lines 5-6 of Algorithm 1)

Algorithm 1 constructs the set such that contains  elements from the ground set and, for any element and any element , the marginal value of is at least that of ; that is, among all elements in , the set  contains a collection of elements that correspond to the highest marginal values of . In detail, Algorithm 1 constructs  by first sorting and indexing all elements in  such that and (line 5 of Algorithm 1), and, then, by including in the fist elements among the (line 6 of Algorithm 1).

#### Construction of set St,2 (lines 7-10 of Algorithm 1)

Algorithm 1 constructs the set by picking greedily elements from the set , and by accounting for the effect that the history of set selections and removals () has on the objective function of Problem 2. Specifically, the greedy procedure in Algorithm 1’s “while loop” (lines 7-10 of Algorithm 1) selects an element to add in only if maximizes the value of .

#### Construction of set At (line 11 of Algorithm 1)

Algorithm 1 proposes the set as a solution to Problem 2’s -th maximization step. To this end, Algorithm 1 constructs as the union of the previously constructed sets and .

In sum, Algorithm 1 enables an adaptive solution of Problem 2: for each , Algorithm 1 constructs a solution set to the -th maximization step of Problem 2 based on both the history of selected solutions up to step , namely, the sets , and the corresponding history of set removals from , namely, the .

## 4 Performance Guarantees for Algorithm 1

We quantify Algorithm 1’s performance, by bounding its running time, and its approximation performance. To this end, we use the following two notions of curvature for set functions.

### 4.1 Curvature and total curvature of non-decreasing functions

We present the notions of curvature and of total curvature for non-decreasing set functions. We start by describing the notions of modularity and submodularity for set functions.

###### Definition (Modularity)

Consider any finite set . The set function is modular if and only if for any set , it holds .

In words, a set function is modular if through  all elements in cannot substitute each other; in particular, Definition 4.1 of modularity implies that for any set , and for any element , it holds .

###### Definition (Submodularity [26, Proposition 2.1])

Consider any finite set . The set function is submodular if and only if for any sets , and any element , it holds .

Definition 4.1 implies that a set function is submodular if and only if it satisfies a diminishing returns property where for any set , and for any element , the marginal gain is non-increasing. In contrast to modularity, submodularity implies that the elements in can substitute each other, since Definition 4.1 of submodularity implies the inequality ; that is, in the presence of the set , the element may lose part of its contribution to the value of .

###### Definition

(Curvature of monotone submodular functions [20]) Consider a finite set , and a non-decreasing submodular set function such that (without loss of generality) for any element , it is . The curvature of is defined as follows:

 κg≜1−minv∈Vg(V)−g(V∖{v})g(v). (4)

Definition 4.1 of curvature implies that for any non-decreasing submodular set function , it holds . In particular, the value of measures how far  is from modularity, as we explain next: if , then for all elements , it holds , that is, is modular. In contrast, if , then there exist an element such that , that is, in the presence of ,  loses all its contribution to the value of .

###### Definition

(Total curvature of non-decreasing functions [27, Section 8]) Consider a finite set , and a monotone set function . The total curvature of is defined as follows:

 cg≜1−minv∈VminA,B⊆V∖{v}g({v}∪A)−g(A)g({v}∪B)−g(B). (5)

Definition 4.1 of total curvature implies that for any non-decreasing set function , it holds . To connect the notion of total curvature with that of curvature, we note that when the function is non-decreasing and submodular, then the two notions coincide, i.e., it holds ; the reason is that if is non-decreasing and submodular, then the inner minimum in eq. (5) is attained for and . In addition, to connect the above notion of total curvature with the notion of modularity, we note that if , then is modular, since eq. (5) implies that for any elements , and for any sets , it holds:

 (1−cg)[g({v}∪B)−g(B)]≤g({v}∪A)−g(A), (6)

which for implies the modularity of . Finally, to connect the above notion of total curvature with the notion of monotonicity, we mention that if , then eq. (6) implies that is merely non-decreasing (as it is already assumed by the Definition 4.1 of total curvature).

###### Definition (Approximate submodularity)

Consider a finite set , and a non-decreasing set function , whose total curvature is such that . Then, we say that is approximately submodular.

### 4.2 Performance analysis for Algorithm 1

We quantify Algorithm 1’s approximation performance, as well as, its running time per maximization step in Problem 2.

###### Theorem (Performance of Algorithm 1)

Consider an instance of Problem 2, the notation therein, the notation in Algorithm 1, and the definitions:

• let the number be the (optimal) value to Problem 2;

• given sets as solutions to the maximization steps in Problem 2, let be the collection of optimal (worst-case) set removals from each of the , where , per Problem 2, i.e.:

 B⋆(A1:T)∈argminB1⊆A1,|B1|≤β1⋯minBT⊆AT,|BT|≤βT f(A1∖B1,…,AT∖BT);

The performance of Algorithm 1 is bounded as follows:

1. (Approximation performance) Algorithm 1 returns the sequence of sets such that, for all , it holds , , and:

• if the objective function is non-decreasing and submodular, then:

 f(A1:T∖B⋆(A1:T))f⋆≥(1−κf)4, (7)

where is the curvature of (Definition 4.1).

• if the objective function is non-decreasing, then:

 f(A1:T∖B⋆(A1:T))f⋆≥(1−cf)5, (8)

where is the total curvature of (Definition 4.1).

2. (Running time) Algorithm 1 constructs each set , for each , to solve the -th maximization step of Problem 2, with evaluations of .

Provable approximation performance. Theorem 4.2 implies on the approximation performance of Algorithm 1:

#### Near-optimality

Both for monotone submodular objective functions with curvature , and for merely monotone objective functions  with total curvature , Algorithm 1 guarantees a value for Problem 2 finitely close to the optimal. In particular, per ineq. (7) (case of submodular objective functions), the approximation factor of Algorithm 1 is bounded by , which is non-zero for any monotone submodular function  with ; similarly, per ineq. (8) (case of approximately-submodular functions), the approximation factor of Algorithm 1 is bounded by , which is non-zero for any monotone function  with —notably, although it is known for the problem of (non-resilient) set function maximization that the approximation bound is tight [27, Theorem 8.6], the tightness of the bound in ineq. (8) for Problem 2 is an open problem.

We discuss classes of functions with curvatures or , along with relevant applications, in the remark below.

###### Remark

(Classes of functions with or , and applications) Classes of functions with are the concave over modular functions [17, Section 2.1], and the of positive-definite matrices [28, 29]. Classes of functions with are support selection functions [30], and estimation error metrics such as the average minimum square error of the Kalman filter [2, Theorem 4]

The aforementioned classes of functions with or appear in applications of facility location, machine learning, and control, such as sparse approximation and feature selection [4, 5], sparse recovery and column subset selection [6, 7], and actuator and sensor scheduling [8, 2]; as a result, Problem 2 enables applications such as resilient experiment design, resilient actuator scheduling for minimal control effort, and resilient multi-robot navigation with minimal sensing and communication.

#### Approximation performance for low curvature

For both monotone submodular and merely monotone objective functions , when the curvature and the total curvature , respectively, tend to zero, Algorithm 1 becomes exact since for and the terms and in ineq. (7) and ineq. (8) tend to . Overall, Algorithm 1’s curvature-dependent approximation bounds make a first step towards separating the classes of monotone submodular and merely monotone functions into functions for which Problem 2 can be approximated well (low curvature functions), and functions for which it cannot (high curvature functions).

A machine learning problem where Algorithm 1 guarantees an approximation performance close to the optimal is that of Gaussian process regression for processes with RBF kernels [31, 32]; this problem emerges in applications of sensor deployment and scheduling for temperature monitoring. The reason that in this class of regression problems Algorithm 1 performs almost optimally is that the involved objective function is the entropy of the selected sensor measurements, which for Gaussian processes with RBF kernels has curvature value close to zero [29, Theorem 5].

#### Approximation performance for no failures or attacks

Both for monotone submodular objective functions , and for merely monotone objective functions , when the number of attacks, deletions, and failures is zero (, for all ), Algorithm 1’s approximation performance is the same as that of the state-of-the-art algorithms for (non-resilient) set function maximization. In particular, when for all it is , then Algorithm 1 is the same as the local greedy algorithm, proposed in [11, Section 4] for (non-resilient) set function maximization, whose approximation performance is optimal [27, Theorem 8.6].

Minimal running time. Theorem 4.2 implies that Algorithm 1, even though it goes beyond the objective of (non-resilient) multi-step set function optimization, by accounting for attacks, deletions, and failures, it has the same order of running time as state-of-the-art algorithms for (non-resilient) multi-step set function optimization. In particular, such algorithms for (non-resilient) multi-step set function optimization [11, Section 4] [27, Section 8] terminate with evaluations of the objective function per maximization step , and Algorithm 1 also terminates with evaluations of the objective function  per maximization step .

Summary of theoretical results. In sum, Algorithm 1 is the first algorithm for Problem 2, and it enjoys:

• system-wide resiliency: Algorithm 1 is valid for any number of denial-of-service attacks, deletions, and failures;

• adaptiveness: Algorithm 1 adapts the solution to each of the maximization steps in Problem 2 to the history of inflicted denial-of-service attacks and failures;

• minimal running time: Algorithm 1 terminates with the same running time as state-of-the-art algorithms for (non-resilient) multi-step submodular function optimization;

• provable approximation performance: Algorithm 1 ensures for all monotone objective functions that are either submodular or approximately submodular (), and for all , a solution finitely close to the optimal.

Notably, Algorithm 1 is also the first to guarantee for any number of failures, and for monotone approximately-submodular functions , a provable approximation performance for the one-step version of Problem 2 where .

## 5 Numerical Experiments

In this section, we demonstrate a near-optimal performance of Algorithm 1 via numerical experiments. In particular, we consider a control-aware sensor scheduling scenario, namely, sensing-constrained robot navigation.1 According to this scenario, an unmanned aerial vehicle (UAV), which has limited remaining battery and measurement-processing power, has the objective to land, and to this end, it schedules to activate at each time step only a subset of its on-board sensors, so to localize itself and enable the generation of a control input for landing; specifically, at each time step, the UAV generates its control input by implementing an LQG-optimal controller, given the measurements collected by the activated sensors up to the current time step [2, 33].

In more detail, in the following paragraphs we present a Monte Carlo analysis for an instance of the aforementioned sensing-constrained robot navigation scenario, in the presence of worst-case sensor failures, and observe that Algorithm 1 results to a near-optimal sensor selection schedule: in particular, the resulting navigation performance of the UAV matches the optimal in all tested instances for which the optimal selection could be computed via a brute-force approach.

Simulation setup. We adopt the same instance of the sensing-constrained robot navigation scenario adopted in [2, Section V.B]. Specifically, a UAV moves in a 3D space, starting from a randomly selected initial location. The objective of the UAV is to land at position with zero velocity. The UAV is modelled as a double-integrator with state at each time ( is the 3D position of the UAV, and is its velocity), and can control its own acceleration ; the process noise is chosen as . The UAV is equipped with multiple sensors, as follows: it has two on-board GPS receivers, measuring the UAV position with a covariance , and an altimeter, measuring only the last component of (altitude) with standard deviation . Moreover, the UAV can use a stereo camera to measure the relative position of landmarks on the ground; we assume the location of each landmark to be known only approximately, and we associate to each landmark an uncertainty covariance, which is randomly generated at the beginning of each run. The UAV has limited on-board resource-constraints, hence it can only activate a subset of sensors (possibly different at each time step). For instance, the resource-constraints may be due to the power consumption of the GPS and the altimeter, or due to computational constraints that prevent to run object-detection algorithms to detect all landmarks on the ground.

Among the aforementioned possible sensor measurements available to the UAV at each time step, we assume that the UAV can use only of them. In particular, the UAV chooses the sensors to activate at each time step so to minimize an LQG cost with cost matrices (which penalizes the state vector) and (which penalizes the control input vector), per the problem formulation in [2, Section II]; specifically, in this simulation setup we set and . Note that the structure of (which penalizes the magnitude of the state vector) reflects the fact that during landing we are particularly interested in controlling the vertical direction and the vertical velocity (entries with larger weight in ), while we are less interested in controlling accurately the horizontal position and velocity (assuming a sufficiently large landing site). Given a time horizon for landing, in [2] it is proven that the UAV selects an optimal sensor schedule and generates an optimal LQG control input with cost matrices and if it selects the sensors set to activate at each time by minimizing an objective function of the form:

 T∑t=1trace[MtΣt|t(S1,…,St)], (9)

where is a positive semi-definite matrix that depends on the LQG cost matrices and , as well as, on the UAV’s model dynamics; and is the error covariance of the Kalman filter given the sensor selections up to time .

In the remaining paragraphs, we present results averaged over 10 Monte Carlo runs of the above simulation setup, where in each run we randomize the covariances describing the landmark position uncertainty, and where we vary the number  of sensors failures at each time step : in particular, we consider to vary among the values .

Compared algorithms. We compare four algorithms. All algorithms only differ in how they select the sensors used. The first algorithm is the optimal sensor selection algorithm, denoted as optimal, which attains the minimum of the cost function in eq. (9); this brute-force approach is viable since the number of available sensors is small. The second approach is a pseudo-random sensor selection, denoted as random, which selects one of the GPS measurements and a random subset of the lidar measurements; note that we do not consider a fully random selection since in practice this often leads to an unobservable system. The third approach, denoted as greedy, selects sensors to greedily minimize the cost function in eq. (9), ignoring the possibility of sensor failures, per the problem formulation in eq. (1). The fourth approach uses Algorithm 1 to solve the resilient reformulation of eq. (9), per Problem 2, and is denoted as resilience.

At each time step, from each of the selected sensor sets, selected by any of the above four algorithms, we consider an optimal sensor removal, which we compute via a brute-force.

Results. The results of our numerical analysis are reported in Fig. 1. In particular, Fig. 1 shows the LQG cost for increasing time, for the case where the number of selected sensors at each time step is , while the number of sensor failures at each time step varies across the values , . The following observations are due:

• (Near-optimality of Algorithm 1) Algorithm 1 (blue colour in Fig. 1) performs close to the optimal brute-force algorithm (green colour in Fig. 1); in particular, across all scenarios in Fig. 1, Algorithm 1 achieves an approximation performance at least 97% the optimal.

• (Performance of greedy algorithm) The greedy algorithm (red colour in Fig. 1) performs poorly as the number  of sensor failures increases, which was expected, given that this algorithm greedily minimizes the cost function in eq. (9) ignoring the possibility of sensor failures.

• (Performance of random algorithm) Expectedly, also the performance of the random algorithm (black colour in Fig. 1) is poor across all scenarios in Fig. 1.

Overall, in the above numerical experiments, Algorithm 1 demonstrates a near-optimal approximation performance, and the necessity for the resilient reformulation of the problem in eq. (1) per Problem 2 is exemplified.

## 6 Concluding remarks & Future work

We made the first step to ensure the success of critical missions in machine learning and control, that involve the optimization of systems across multiple time-steps, against persistent failures or denial-of-service attacks. In particular, we provided the first algorithm for Problem 2, which, with minimal running time, adapts to the history of the inflicted failures and attacks, and guarantees a close-to-optimal performance against system-wide failures and attacks. To quantify the algorithm’s approximation performance, we exploited a notion of curvature for monotone (not necessarily submodular) set functions, and contributed a first step towards characterizing the curvature’s effect on the approximability of resilient sequential maximization. Our curvature-dependent characterizations complement the current knowledge on the curvature’s effect on the approximability of simpler problems, such as of non-sequential resilient maximization [22, 23], and of non-resilient maximization [20, 17, 18]. Finally, we supported our theoretical analyses with simulated experiments.

This paper opens several avenues for future research, both in theory and in applications. Future work in theory includes the extension of our results to matroid constraints, to enable applications of set coverage and of network design [34, 35]. Future work in applications includes the experimental testing of the proposed algorithm in applications of motion-planning for multi-target covering with mobile vehicles [9], and in applications of control-aware sensor scheduling for multi-agent autonomous navigation [2], to enable resiliency in critical scenarios of surveillance, and of search and rescue.

## 7 Acknowledgements

We thank Luca Carlone for inspiring discussions.

\appendices

## Appendix A Notation

In the appendix we use the following notation to support the proofs in this paper: given a finite ground set , and a set function , then, for any sets and :

 f(X|X′)≜f(X∪X′)−f(X′). (10)

Moreover, let the sets denote an (optimal) solution to Problem 2, i.e.:

 A⋆1:T∈argmaxA1⊆V1minB1⊆A1⋯maxAT⊆VTminBT⊆ATf(A1∖B1,…,AT∖BT),such that:|At|=αt and |Bt|≤βt, for all t=1,…,T\!. (11)

## Appendix B Preliminary lemmas

We list lemmas that support the proof of Theorem 4.2.

###### Lemma

Consider a finite ground set and a non-decreasing submodular set function such that is non-negative and . Then, for any , it holds:

 f(A)≥(1−κf)∑a∈Af(a).

#### Proof of Lemma B

Let . We prove Lemma B by proving the following two inequalities:

 f(A) ≥|A|∑i=1f(ai|V∖{ai}), (12) |A|∑i=1f(ai|V∖{ai}) ≥(1−κf)|A|∑i=1f(ai). (13)

We begin with the proof of ineq. (12):

 f(A) =f(A|∅) (14) ≥f(A|V∖A) (15) =|A|∑i=1f(ai|V∖{ai,ai+1,…,a|A|}) (16) ≥|A|∑i=1f(ai|V∖{ai}), (17)

where ineqs. (15) to (17) hold for the following reasons: ineq. (15) is implied by eq. (14) because is submodular and ; eq. (16) holds since for any sets and it is , and it also denotes the set ; and ineq. (17) holds since is submodular and . These observations complete the proof of ineq. (12).

We now prove ineq. (13) using the Definition 4.1 of , as follows: since , it is implied that for all elements it is . Therefore, adding the latter inequality across all elements completes the proof of ineq. (13).

###### Lemma

Consider a finite ground set and a monotone set function such that is non-negative and . Then, for any sets and such that , it holds:

 f(A∪B)≥(1−cf)(f(A)+f(B)).

#### Proof of Lemma B

Let . Then,

 f(A∪B)=f(A)+|B|∑i=1f(bi|A∪{b1,b2,…,bi−1}). (18)

The definition of total curvature in Definition 4.1 implies:

 f(bi|A∪{b1,b2,…,bi−1})≥ (1−cf)f(bi|{b1,b2,…,bi−1}). (19)

The proof is completed by substituting ineq. (19) in eq. (18) and then by taking into account that it holds , since .

###### Lemma

Consider a finite ground set and a non-decreasing set function such that is non-negative and . Then, for any set and any set such that , it holds:

 f(A∪B)≥(1−cf)(f(A)+∑b∈Bf(b)).

#### Proof of Lemma B

Let . Then,

 f(A∪B)=f(A)+|B|∑i=1f(bi|A∪{b1,b2,…,bi−1}). (20)

In addition, Definition 4.1 of total curvature implies:

 f(bi|A∪{b1,b2,…,bi−1}) ≥(1−cf)f(bi|∅) =(1−cf)f(bi), (21)

where the latter equation holds since . The proof is completed by substituting (21) in (20) and then taking into account that since .

###### Lemma

Consider a finite ground set and a non-decreasing set function such that is non-negative and . Then for any set and any set such that , it holds:

 f(A)+(1−cf)f(B)≥(1−cf)f(A∪B)+f(A∩B).

#### Proof of Lemma B

Let , where . From Definition 4.1 of total curvature , for any , it is . Summing these inequalities,

 f(A)−f(A∩B)≥(1−cf)(f(A∪B)−f(B)),

which implies the lemma.

###### Corollary

Consider a finite ground set and a non-decreasing set function such that is non-negative and . Then, for any set and any set such that , it holds:

 f(A)+∑b∈Bf(b)≥(1−cf)f(A∪B).

#### Proof of Corollary B

Let .

 f(A)+|B|∑i=1f(bi) ≥(1−cf)f(A)+|B|∑i=1f(bi)) (22) ≥(1−cf)f(A∪{b1})+|B|∑i=2f(bi) ≥(1−cf)f(A∪{b1,b2})+|B|∑i=3f(bi) ⋮ ≥(1−cf)f(A∪B),

where (22) holds since , and the rest due to Lemma B since implies , , , .

###### Lemma

Recall the notation in Algorithm 1. Given the sets selected by Algorithm 1 (lines 5-6 of Algorithm 1), then, for each , let t