Information Theoretic Secure Aggregation with User Dropouts

# Information Theoretic Secure Aggregation with User Dropouts

## Abstract

In the robust secure aggregation problem, a server wishes to learn and only learn the sum of the inputs of a number of users while some users may drop out (i.e., may not respond). The identity of the dropped users is not known a priori and the server needs to securely recover the sum of the remaining surviving users. We consider the following minimal two-round model of secure aggregation. Over the first round, any set of no fewer than users out of users respond to the server and the server wants to learn the sum of the inputs of all responding users. The remaining users are viewed as dropped. Over the second round, any set of no fewer than users of the surviving users respond (i.e., dropouts are still possible over the second round) and from the information obtained from the surviving users over the two rounds, the server can decode the desired sum. The security constraint is that even if the server colludes with any users and the messages from the dropped users are received by the server (e.g., delayed packets), the server is not able to infer any additional information beyond the sum in the information theoretic sense. For this information theoretic secure aggregation problem, we characterize the optimal communication cost. When , secure aggregation is not feasible, and when , to securely compute one symbol of the sum, the minimum number of symbols sent from each user to the server is over the first round, and over the second round.

1

## 1 Introduction

The rapidly increasing volume of data available at massive distributed nodes enables powerful large-scale learning applications. For example, in federated learning [1, 2, 3], a large number of mobile users wish to collaboratively train a shared global model, coordinated by a central server. While the distributed users are willing to cooperate with the server to learn the shared model, they do not fully trust the server and do not want to reveal any information beyond what is necessary to train the desired model. Specifically, when the local models of the distributed users are aggregated (in the form of summation usually) at the server to produce the global model, each user does not want to reveal any additional information about its local data. Therefore, regarding security, the central technical problem is secure sum computation or secure aggregation [4, 5], i.e., how to compute, with as little communication as possible, the sum of the inputs of a number of users without exposing any information beyond the sum. A particular challenge in secure aggregation brought by federated learning is the phenomenon of user dropouts, i.e., some users whose identities are not known beforehand may drop from the learning procedure (due to unreliable communication connections or limited battery life) and the server needs to be able to robustly recover the sum of the inputs of the remaining surviving users while learning nothing else at the same time. The robustness to dropped users is a key requirement that calls for novel models and analysis. The main objective of this work is to understand the fundamental communication limits of information theoretic secure aggregation with user dropouts.

### Secure Aggregation with User Dropouts

The secure aggregation problem is comprised of one server and users. User holds an input , which is a vector of elements from a field. In federated learning, the input may represent the local model, model update, gradient, loss, or parameters of User , from one iteration of the iterative training optimization process and is typically high-dimensional, i.e., is large, which matches well with the Shannon theoretic formulation where is allowed to approach infinity. In this work, we focus on such inputs from one iteration as the secure aggregation problem remains the same for all iterations. A randomness variable , independent of all inputs, is generated offline (before the values of are known) and is available to User to assist with the secure aggregation task.

The server wishes to compute the element-wise sum of the vector inputs of all users. To do so, each user sends a message , as a function of and , to the server. However, due to user dropouts, the server may not receive all messages; if only the messages from the set of users arrive at the server and other messages are dropped, then the server wants to securely compute , i.e., the sum of the inputs of all responding users, from . For example, suppose and . Then the server sees only and wants to recover while learning no other information, e.g., the server cannot infer . We now observe an inherent deficiency of such a model, caused by the uncertainty of the identity of the dropped users. As it is not known a priori which users will drop, the sent messages cannot depend on the set of dropped users and must enable secure computation for all possible responding users. For example, if , then the server must be able to decode from , which contradicts the security constraint for the case where , i.e., from , the server can learn only . Therefore, for the above communication model as the identity of the responding users is unknown beforehand, it is not feasible to learn only the sum of their inputs and nothing else.

The remedy is to include additional rounds of communication, and this solution has been taken in prior works on secure aggregation [4, 5, 6, 7, 8]. In this work, we consider the simplest model of two rounds. We refer to the round that is discussed above and parameterized by , as the first round. At the end of the first round, the server informs all responding users about the surviving user set and the remaining users are viewed as dropped thus no further communication with them is requested. One additional round of messages are requested from the surviving users in . This round is referred to as the second round and the message from User is denoted as , where the superscript highlights that the identity of the surviving users over the first round is known when the user decides the second round message (also as a function of and ). User dropouts are still possible over the second round and we denote the set of responding users over the second round by , which is a subset of . We assume that , the cardinality of , is at least , a pre-determined threshold parameter. That is, the server will wait for at least users, e.g., by setting up a proper time deadline. As , we have . The setup of this parameter is interpreted as the worst case estimate of the number of surviving users, and is also to make the secure aggregation problem more interesting. See Figure 1 for an example where , and , .

After describing the communication model, we now proceed to state the two constraints of secure aggregation - correctness and security.

• Correctness constraint: From only the messages received from the surviving users over the two rounds, the server can decode with no error. For example, in Figure 1, it is required that can be recovered from .

• Information theoretic security constraint: From all the messages sent from the users over the two rounds (including those from dropped users as their packets may be merely delayed) and even if the server colludes with any set of at most users, the server cannot infer any additional information in the information theoretic sense about all inputs beyond what is already known from the colluding user(s) and the desired sum. For example, suppose and the colluding user is User in Figure 1, then it is required that from all the messages and colluding user’s information , no information about is revealed, except and . Specifically, while can be obtained, nothing more about or can be learned.

Importantly, we emphasize that a feasible secure aggregation protocol must satisfy the correctness and security constraints for any first round responding user set where , any second round responding user set where and , and any colluding user set where . A secure aggregation protocol specifies a design of the messages and we are interested in characterizing the optimal communication efficiency, i.e., minimizing the number of symbols contained in the messages and .

As a recap, our information theoretic secure aggregation formulation contains parameters, (the number of users), (a threshold parameter on the minimum number of responding users), and (a threshold parameter on the maximum number of colluding users). We assume that , so there may exist dropped users; otherwise and the problem becomes degraded as all users must respond. We also assume that ; otherwise or , then when the colluding user set contains at least users, there is nothing to hide, as from the desired sum and inputs from the colluding users, the server can decode all inputs. For this model, our main goal is to answer the following question - to compute one symbol of the desired sum function securely, what is the minimum number of symbols that must be sent from the users over the first round and over the second round, as a function of ?

### Summary of Results

We obtain a complete answer to the above question, i.e., the exact characterization of the optimal communication efficiency of information theoretic secure aggregation. Specially, we show that

• when , secure aggregation is not infeasible in the information theoretic sense;

• when , the minimum number of symbols that each user needs to send is symbol over the first round, and symbols over the second round, per symbol of desired sum.

The proofs of the above result are fairly standard. The protocol design uses and adapts elements that are frequently encountered in secure (sum) computation literature [9, 10, 11, 12, 13] (see Section 4). The entropy based proof of impossibility claims uses Shannon’s information theoretic security framework [14], which will be adapted to robust secure aggregation (see Section 5) and is conceptually similar to that in (symmetric) private information retrieval context [15, 16, 17, 18, 19, 20]. While the optimal communication efficiency is established, a number of relevant problems remain widely open, e.g., the minimum randomness consumption (see Section 6).

Let us conclude the introduction section by summarizing the major differences between our work and existing works on secure aggregation for federated learning, which has attracted tremendous recent attention [4, 5, 6, 7, 8, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]. First of all, to the best of our knowledge, our work is the only one that considers information theoretic security, i.e., unconditional security based on statistical independence; while all prior works focus on cryptographic security, i.e., conditional security against computationally bounded adversaries. Second, we first define the system parameters (e.g., allowed user dropouts and collusions), and then study the fundamental limits (i.e., the best possible protocols) given the specified parameters; while most existing works first propose a specific protocol and then analyze its performance (e.g., allowed user dropouts and collusions). Last but not least, we assume that the randomness variables of certain joint distribution are distributed to the users by a trusted third-party before the communication protocol starts (i.e., offline); while most prior works jointly consider randomness generation/distribution and message transmission (i.e., online). We view randomness generation as a separate problem to be studied in a future work, e.g., how to efficiently generate and distribute the required correlated randomness variables.

## 2 Problem Statement

The secure aggregation problem involves a server and users, where and User holds an input vector and a randomness variable . The input vectors are independent. Each is an column vector and the elements are i.i.d. uniform symbols from the finite field2 . is independent of .

 H((Wk)k∈[K],(Zk)k∈[K])=∑k∈[K]H(Wk)+H((Zk)k∈[K]), (1) H(Wk)=L (in q-ary units),∀k∈[K]. (2)

The communication protocol between the server and the users has two rounds. Over the first round, User sends a message to the server. The message is a function of and consists of symbols from .

 H(Xk|Wk,Zk)=0,∀k∈[K]. (3)

Some users may drop and the set of surviving users after the first round is denoted as , which can be any set of at least users, and . The server receives the messages and wishes to securely compute , where the vector summation is defined as the element-wise addition over . To do so, the server informs all surviving users about and requests a second round of messages from them. The second round message sent from User is denoted as , which is a function of and consists of symbols3 from .

 H(YU1k|Wk,Zk)=0,∀k∈U1,∀U1⊂[K],|U1|≥U. (4)

Some users may drop and the set of surviving users after the second round is denoted as , where and . Then the server receives the messages over the second round.

From the messages received from surviving users, the server must be able to decode the desired sum with no error4, i.e., the following correctness constraint must be satisfied for any , where .

 [Correctness]   H⎛⎝∑k∈U1Wk∣∣∣(Xk)k∈U1,(YU1k)k∈U2⎞⎠=0. (5)

We impose that security must be guaranteed even if the messages sent from all surviving and dropped users are received by the server and the server may collude with any set of at most users, where . Specifically, security refers to the constraint that the server cannot infer any additional information about beyond that contained in and known from the colluding users. That is, the following security constraint must be satisfied for any , where .

 [Security]   I⎛⎝(Wk)k∈[K];(Xk)k∈[K],(YU1k)k∈U1∣∣∣∑k∈U1Wk,(Wk,Zk)k∈T⎞⎠=0. (6)

The communication rate characterizes how many symbols each message contains per input symbol, and is defined as follows.

 R1≜LXL, R2≜LYL (7)

where is the first round message rate and is the second round message rate.

A rate tuple is said to be achievable if there exists a secure aggregation scheme (i.e., a design of the correlated randomness variables and the messages ), for which the correctness and security constraints (5), (6) are satisfied, and the first round and second round message rates are smaller than or equal to and , respectively. The closure of the set of all achievable rate tuples is called the optimal rate region, denoted as .

## 3 Main Result: Optimal Rate Region of Secure Aggregation

Theorem 1 states the main result.

###### Theorem 1

For the information theoretic secure aggregation problem with users, at least responding users, and at most colluding users, where , the optimal rate region is

 R∗={∅ when%  U≤T,{(R1,R2):R1≥1,R2≥1U−T} when U>T. (8)

From Theorem 1 and its proof (see Section 4 for achievability and Section 5 for converse), we have the following observations.

• When , i.e., the minimum number of responding users is no greater than the maximum number of colluding users, the information theoretic secure aggregation problem is not feasible, i.e., it is not possible to simultaneously satisfy the correctness constraint (5) and the security constraint (6).

• When , the optimal communication-wise strategy is such that each user sends symbol over the first round, and symbols over the second round, for each input symbol (i.e., to compute one symbol of the desired sum). Note that the optimal rate does not depend on the number of users , and it depends on only through their difference . In particular, when the difference between the two threshold parameters is larger, fewer symbols need to be sent. While the optimal communication cost (per user) may not depend on , the minimum randomness consumption (i.e., the entropy of each and the joint entropy of ) depends on (see Section 6).

• While the input length is allowed to approach infinity in the rate definition (7), the achievable scheme (presented in Section 4) only requires (or integer multiples of ) when the field size satisfies , and for any field size, it suffices to have , where is any integer such that .

## 4 Proof of Theorem 1: Achievability

Before presenting the general achievability proof, we first consider two examples to illustrate the idea, which is fairly straightforward and relies on generic vector linear codes.

### 4.1 Example 1: K=3,u=2,t=0

Consider users, where at least users will respond, and no user will collude with the server (). Suppose the input length is , i.e., .

We first specify the randomness variables. Consider i.i.d. uniform vectors over , denoted as and yield generic linear combinations of the sum of all subsets of with cardinality no fewer than .

 ⎡⎢ ⎢ ⎢⎣Z{1,2,3}1Z{1,2,3}2Z{1,2,3}3⎤⎥ ⎥ ⎥⎦≜MDS3×2[S1(1)+S2(1)+S3(1)S1(2)+S2(2)+S3(2)]

where denotes any MDS matrix of dimension . can be any full rank matrix, and it is presented using MDS matrices to facilitate generalizations to larger parameters. Note that the MDS matrices appeared in (LABEL:eq:mds) exist over any finite field. When , we require slightly stronger properties on theses matrices and will use Cauchy matrices (see the next section). Then we set

 Z1 = (S1,Z{1,2}1,Z{1,3}1,Z{1,2,3}1), Z2 = (S2,Z{1,2}2,Z{2,3}2,Z{1,2,3}2), Z3 = (S3,Z{1,3}3,Z{2,3}3,Z{1,2,3}3). (10)

We have completed the design of the correlated randomness variables.

Next, we describe the design of the messages over two rounds. For the first round, we set

 X1=W1+S1, X2=W2+S2, X3=W3+S3 (11)

where ’’ denotes element-wise addition over . For the second round, we set

 U1={1,2}: Y{1,2}1=Z{1,2}1,Y{1,2}2=Z{1,2}2, U1={1,3}: Y{1,3}1=Z{1,3}1,Y{1,3}3=Z{1,3}3, U1={2,3}: Y{2,3}2=Z{2,3}2,Y{2,3}3=Z{2,3}3, U1={1,2,3}: Y{1,2,3}1=Z{1,2,3}1,Y{1,2,3}2=Z{1,2,3}2,Y{1,2,3}3=Z{1,2,3}3. (12)

Finally, we prove that the scheme is correct and secure, and the rate tuple achieves the extreme point of the optimal rate region.

Correctness: When , i.e., User 3 drops over the first round, we have as , i.e., at least users survive in the end, then no user drops over the second round. From the second round messages , the server can recover with no error, as the precoding matrices chosen in (LABEL:eq:mds) are MDS. Combining with the sum of the received two first round messages, , the desired sum can be decoded with no error. The correctness proof for other cases where follows similarly.

When , i.e., no user drops over the first round, the server must recover when any user drops over the second round. From any two second round messages , the server can recover , due to the assignment using an matrix (see (LABEL:eq:mds)). Then from the first round messages, the server can have . Equipped with , the desired sum can be decoded with no error.

Security: The intuition of the security of the achievable scheme is that the first round messages are protected by independent randomness variables and the second round messages just give merely sufficient randomness information (and no more) to unlock the desired sum. We verify that the security constraint (6) is satisfied.

When , we have

 I(W1,W2,W3;X1,X2,X3,Y{1,2}1,Y{1,2}2∣∣W1+W2) (13) = = H(W1+S1,W2+S2,W3+S3,S1+S2|W1+W2)−H(S1,S2,S3|W1,W2,W3) (14) (???)= H(W1+S1,W2+S2,W3+S3|W1+W2)−H(S1,S2,S3) (15) ≤ 6−6=0 (16)

where in (14) we plug in the design of the randomness and message variables, and in (15) the first term follows from the fact that can be obtained from . In the last step, for the first term we use the fact that contains at most symbols from and uniform distribution maximizes entropy. As mutual information is non-negative, it must be exactly zero when it is smaller than or equal to zero. The security proof for other cases where follows similarly.

When , we have

 I(W1,W2,W3;X1,X2,X3,Y{1,2,3}1,Y{1,2,3}2,Y{1,2,3}3∣∣W1+W2+W3) (17) = H(W1+S1,W2+S2,W3+S3,S1+S2+S3|W1+W2+W3) −H(S1,S2,S3|W1,W2,W3) (???)= H(W1+S1,W2+S2,W3+S3|W1+W2+W3)−H(S1,S2,S3) (18) ≤ 6−6=0 (19)

where in (17), we use the fact that is invertible to .

Rate: As the first round message contains symbols each and the second round message contains symbol each, the rate achieved is and , which matches Theorem 1.

### 4.2 Example 2: K=3,u=2,t=1

Continuing from the above example, we increase from to , i.e., the server could collude with any single user. The new element needed here is to inject additional noise in sharing the sum of randomness variables used in the first round messages. Note that while this coding idea is simple to describe, the security proof becomes more involved.

Suppose , i.e., and . The achievable scheme is described as follows.

Randomness Assignment: Consider i.i.d. uniform symbols over , denoted as , and yield the following generic linear combinations of the sum of some subsets of and some additional noise variable .

 (20)

where denotes a Cauchy matrix of dimension , i.e., the element in the -th row and -column is set as

 cij=1αi−βj, αi,βj,i∈[a],j∈[b] are distinct over Fq. (21)

Note that , so distinct elements as required above exist over . Intuitively, Cauchy matrices are used to ensure that the independent noise variables are fully mixed with the sum of variables to avoid any unwanted leakage (see the proof below). Then we set

 Zk = (Sk,(ZU1k)U1:k∈U1⊂{1,2,3},|U1|≥2),∀k∈{1,2,3}. (22)

Message Generation: For the first round, we set

 X1=W1+S1, X2=W2+S2, X3=W3+S3. (23)

For the second round, we set

 ∀U1⊂{1,2,3},|U1|≥2: YU1k=ZU1k, ∀k∈U1. (24)

Proof of Correctness: For any such that , due to the randomness and message design (see (20) and (24)), the server can recover from any set of second round messages where . Then from , the server can decode the desired sum aggregation with no error.

Proof of Security: We show that the injected noise variables help to guarantee the security constraint (6) under collusion.

Suppose and the colluding user set is , then we have

 I(W1,W2,W3;X1,X2,X3,Y{1,2}1,Y{1,2}2∣∣W1+W2,W1,Z1) (26) = H(X1,X2,X3,Y{1,2}1,Y{1,2}2∣∣W1+W2,W1,Z1) = H(W1+S1,W2+S2,W3+S3,S1+S2,N1|W1+W2,W1,Z1) −H(S1,S2,S3,N1|W1,W2,W3,Z1) (???)= H(S1,W2+S2,W3+S3,N1|W1+W2,W1,Z1)−H(S1,S2,S3,N1|Z1) (28) = H(W2+S2,W3+S3|W1+W2,W1,Z1)+H(N1|W2+S2,W3+S3,W1+W2,W1,Z1)=0 −H(S2,S3|Z1)−H(N1|Z1,S2,S3)=0 ≤ 2−2=0 (29)

where (28) is due to the fact that is contained in (see (22)) and can be obtained from (contained in ), when is known (obtained from , refer to (20), (22)). In the last step, the first term follows from the property that uniform random variables are entropy maximizers, and the second term is due to the independence of and , whose proof will be presented in Lemma 1 when we give the general proof.

The security proof for other cases of and is similar to that above, which is omitted here and deferred to the general proof presented in the next section.

Rate Calculation: As symbol, we have , as desired for this case.

### 4.3 General Proof for Arbitrary K,u,t

The achievability proof for arbitrary is an immediate generalization of that of the above two examples. We first consider the case where the field size is no smaller than , and then show that the proof can be adapted with a minor change to cover all other field sizes. As when , we only need to consider settings where .

#### Large fields: q≥K+U

Suppose , i.e., and .

Randomness Assignment: Consider i.i.d. uniform vectors over , denoted as . Consider i.i.d. uniform vectors over , denoted as . and are independent. Then yield generic linear combinations of the sum of some variables and some variable as follows. For any such that , we set5

 (30)

where denotes a Cauchy matrix of dimension , i.e., the element in the -th row and -column is , where Note that , so the required distinct elements exist over . Then we set

 Zk = (Sk,(ZU1k)U1:k∈U1⊂[K],|U1|≥U),∀k∈[K]. (31)

To prepare for the security proof, we present some useful properties on the entropy of the randomness variables in the following lemma.

###### Lemma 1

For the random variables defined above, for any , any , and any , we have

• is uniform and is independent of .

 H((Zk)k∈T)=H((Zk)k∈T∣∣(Sk)k∈T′)=|T|(L+K−1∑u=U−1(K−1u)), (32) (33)
• Given either or , contains linearly independent combinations of the i.i.d. symbols in .

 (34)

The detailed proof of Lemma 1 is deferred to Section 4.4.

Message Generation: We set

 Xk=Wk+Sk,∀k∈[K], ∀U1⊂[K],|U1|≥U: YU1k=ZU1k, ∀k∈U1. (35)

Proof of Correctness: For any such that , as any square sub-matrix of a Cauchy matrix (with distinct ) has full rank [38], the server can recover from any second round messages. Combining with the first round messages, the server can have and then decode the desired sum aggregation with no error.

Proof of Security: Consider any and any . We verify that the security constraint (6) is satisfied. Denote the difference of two sets as , i.e., the set of elements that belong to but not .

 I⎛⎝(Wk)k∈[K];(Xk)k∈[K],(YU1k)k∈U1∣∣∣∑k∈U1Wk,(Wk,Zk)k∈T⎞⎠ (38) = H⎛⎝(Xk)k∈[K],(YU1k)k∈U1∣∣∣∑k∈U1Wk,(Wk,Zk)k∈T⎞⎠ −H((Xk)k∈[K],(YU1k)k∈U1∣∣∣(Wk)k∈[K],(Zk)k∈T) = H⎛⎝(Wk+Sk)k∈[K],∑k∈U1Sk,NU1∣∣∣∑k∈U1Wk,(Wk,Zk)k∈T⎞⎠ −H⎛⎝(Sk)k∈[K],∑k∈U1Sk,NU1∣∣∣(Wk)k∈[K],(Zk)k∈T⎞⎠ ≤ ≤ (K−|T|)L+(T−|T∩U1|)−(K−|T|)L−(T−|T∩U1|)=0 (39)

where in (38) we plug in the design of the message variables (35) and use the fact that is invertible to (see (30), (35)), and in (38) we use the chain rule and the independence of the inputs and the randomness variables . In the last step, the first term follows from the fact that uniform variables maximize entropy, and other terms follow from Lemma 1.

Rate Calculation: As , we have , as desired. The achievability proof is thus completed.

###### Remark 1

We can verify that both the correctness proof and the security proof do not use the independence and uniformity of the input vectors . As such, the rate tuple is achievable for arbitrarily distributed inputs .

#### Any field size

We consider an arbitrary field , where for a prime and an integer . The proof for the above case only relies on the property that the field size is sufficiently large so that there exist a required number of distinct elements. Here for arbitrary field size, we ‘amplify’ the field size by grouping a number of field elements (say elements) from and view such a group of elements as one element from the extension field . That is, we set so that

 Wk=(W