Adaptive Federated Learning and Digital Twin for Industrial Internet of Things

Adaptive Federated Learning and Digital Twin for Industrial Internet of Things 1


Industrial Internet of Things (IoT) enables distributed intelligent services varying with the dynamic and realtime industrial devices to achieve Industry 4.0 benefits. In this paper, we consider a new architecture of digital twin empowered Industrial IoT where digital twins capture the characteristics of industrial devices to assist federated learning. Noticing that digital twins may bring estimation deviations from the actual value of device state, a trusted based aggregation is proposed in federated learning to alleviate the effects of such deviation. We adaptively adjust the aggregation frequency of federated learning based on Lyapunov dynamic deficit queue and deep reinforcement learning, to improve the learning performance under the resource constraints. To further adapt to the heterogeneity of Industrial IoT, a clustering-based asynchronous federated learning framework is proposed. Numerical results show that the proposed framework is superior to the benchmark in terms of learning accuracy, convergence, and energy saving.

digital twin, federated learning, asynchronous, learning efficiency, communication efficiency.

I Introduction

With the extensive application of embedded system and the demand for intelligence in Industrial Internet of Things (Industrial IoT), the integration of multiple physics (electical, mechanical, hydraulic, etc.) and computational capabilities gives rise to the concept of Cyber-Physical Systems (CPS) [1]. In the era of Industrial IoT where devices are connected with heterogeneous structure, the success of CPS applications hinges on dynamic perception and intelligent decision. However, the difficulty of multi-source context awareness and real-time connection of heterogeneous industrial devices in dynamic environment hinders the implementation of CPS [2].

Digital twins (DTs), which support accurate modeling and synchronous updating, are becoming an important platform for enhancing the intelligence, automation, and real-time nature of the Industrial IoT [3]. It creates virtual objects in the digital space through software definition, accurately mapping entities in the physical space in terms of status, features and evolution, and its excellent state awareness and real-time analysis greatly assist in decision-making and execution. However, DTs are data-driven and a decision making in Industrial IoT usually needs to be supported by a large amount of data distributed across a variety of training devices, while “data islands” exist in most data sources due to competition, privacy and security issues[4]. In reality, it is almost impossible to integrate data scattered in various devices. The dual challenges of privacy protection and big data present an opportunity to develop new technologies that aggregate and model the data under strict prerequisites.

As a model training technology that data owners can perform model training locally without sharing their data, federated learning turned out to become a powerful weapon to meet privacy protection and data security in Industrial IoT [5]. The above description shows that federated learning can build a machine learning system without direct access to training data and the data remains in its original location, which helps ensure privacy and reduce communication costs. Therefore, when it comes to issues such as privacy protection, regulatory needs, data silos, expensive or unreliable connections, federated learning is a viable solution.

Existing work mainly focuses on designing advanced federated learning algorithms to achieve better learning performance including privacy preservation and learning efficiency. In terms of privacy protection, technologies such as homomorphic encryption [6] and differential privacy [7] are applied. At the same time, the additional data of encryption and noise will inevitably increase the communication cost, and the quality of the model will be reduced due to distributed computing. In order to improve the learning effect, Lu et al. [8] introduce a learning framework of asynchronous mode, which accelerated the convergence speed of learning, but the point-to-point communication mode caused a great communication burden.

Existing efforts [6-8] do not take into account the heterogeneity of the Industrial IoT to carefully design appropriate federated learning architectures. In fact, the frequency and timing of aggregation should be carefully designed in federated learning, as the gain of global aggregation is non-linear and the network environment, e.g. the channel state, is time-varying during the federated learning process. It is also noted that for heterogeneous Industrial IoT scenario, the straggler effect makes the synchronized federated learning suffer an unbearable learning delay. To address these issues, we study adaptive calibration of the global aggregation frequency to improve training efficiency under resource constraints. Furthermore, we propose an asynchronous federated learning framework based on node clustering. The contributions of this paper can be summarized as follows.

  • We introduce DTs for Industrial IoT, which map the operating state and behavior of devices to a digital world in real time. By considering the deviation of the DT from the true value in the trust-weighted aggregation strategy, the contribution of devices to the global aggregation of federated learning is quantified, which enhances the reliability and accuracy of learned models.

  • Based on Deep Q Network (DQN), we develop an adaptive calibration of global aggregation frequency to minimize the loss function of federated learning under a given resource budget, enabling the dynamic tradeoff between computing energy and communication energy in real-time changing communication environments.

  • To further adapt to heterogeneous Industrial IoT, we propose an asynchronous federated learning framework to eliminate the straggler effect by clustering nodes, and improve the learning efficiency through appropriate time-weighted inter-cluster aggregation strategy. The aggregation frequencies of different clusters are determined by the adaptive frequency calibration based on DQN. Numerical results show that the proposed scheme is superior to the benchmark scheme in terms of learning accuracy, convergence rate and energy saving.

The organization of the remaining paper is as follows. In Section II, we overview the related works. The DT-based system model is introduced in Section III. The frequency of global aggregation and an asynchronous federated learning framework are carefully designed based on DQN in Section IV. The simulation results of the proposed scheme performance are provided in Section V. Finally, Section VI concludes the paper.

Ii Related Work

Existing work of federated learning mainly focuses on update architecture, aggregation strategy and frequency aggregation.

Update architecture. Most of the current algorithms use synchronous architecture, such as FedAvg [5]. However, the synchronous architecture makes the duration of training limited to slower nodes and not suitable for scenarios where the node resources are heterogeneous, i.e., the straggler effect. A few studies have also considered asynchronous learning, e.g., Lu et al. [8] allow asynchronous updates through a random distributed update scheme, but it does cause out-of-order communication between nodes, which is indeed a huge communication burden. In addition, Fadlullah et al. [9] studied the asynchronous update of weights, which supports shallow parameters to be updated more frequently in an asynchronous manner but does not eliminate the straggler effect.

Aggregation strategy. The current research has explored the influence of factors such as data size, computing power, and reputation value on aggregation [10-12], which are closely related to application scenarios. To capture the relationship between non-IID and unbalanced data, FedAvg weighted the data amount of the training nodes. In a resource-constrained scenario, Nishio et al. [13] comprehensively consider the data resources, computing power, and wireless channel conditions of heterogeneous mobile devices to accelerate convergence. Pandey et al. [14] introduce communication efficiency to measure the reliability of equipment, and establish a probability model to calculate the corresponding aggregation weight. Obviously, finding out the important factors that affect the training effect under the specific application scenario is the top priority in solving the aggregation challenge of federated learning.

Aggregation frequency. The adaptive calibration of the global aggregation frequency is beneficial to improve the scalability and efficiency of federated learning [15]. To better adapt to dynamic resource constraints, Wang et al. [16] proposed an adaptive global aggregation scheme to change the global aggregation frequency to improve training efficiency under resource constraints. Similarly, Tran et al. [17] captured the trade-off between computation and communication delay determined by learning accuracy, and defined an optimization problem about the global optimal learning time which can be decomposed into three convex subproblems to solve.

In general, there are still some issues to be solved in federated learning, i.e., the straggler effect is not completely eliminated, the adaptive aggregation frequency in Industrial IoT. In this paper, we combine DTs and DQN to adaptively reduce energy consumption and design an asynchronous federated learning framework to eliminate the straggler effect.

Iii System Model

Fig. 1: DTs for federated learning in a heterogeneous Industrial IoT scenario.

Iii-a DTs for Industrial IoT

As shown in Fig. 1, we introduce a three-tier heterogeneous network in Industrial IoT, consisting of industrial devices, servers, and DTs of industrial devices. Devices with limited communication and computing resources are connected to servers via wireless communication links. DTs are models that map the physical status of devices and update in real time.

The DT of an Industrial device is established by the server to which it belongs, where the history and current behavior of the device are dynamically presented in a digital form by collecting and processing the existing key physical state of the device. Within time , the DT of the training node can be expressed as


where is the current trained parameter of the node , represents the current training state of the node , is the current computational capability of the node , and indicates the energy consumption.

Note that there is a deviation between the mapped value of DT and the actual value. We use the CPU frequency deviation to represent the deviation between the actual value of the device and its DT mapping value. Therefore, the DT model after calibration can be expressed as


This model can receive the physical state data of the device and perform self-calibration according to the empirical deviation value, while maintaining consistency with the device and feeding back information in real time, thereby achieving dynamic optimization of the physical world.

Iii-B Federated learning in Industrial IoT

In Industrial IoT, industrial devices (excavators, sensors, monitors, etc.) need collaboratively complete production tasks based on federated learning. As shown in Fig. 1, an excavator with sensors collects a large amount of production data and is in a real-time monitoring environment. Through collaboration of curators to perform federated learning and intelligent analysis, better decisions can be made for quality control and predictive maintenance.

The first step of federated learning is task initialization, in which the curator broadcasts the task and the initialized global model . Next, after receiving , the training node uses its data to update the local model parameters to find the optimal parameter that minimizes the loss function


where denotes the current local iteration index, quantifies the difference between estimated and true values for instances of running data , and is the samples of training data. After rounds of local training, the node sends the updated local model parameters to the curator, where is a preset frequency. Then, the curator is responsible for the global model aggregation to obtain the parameters of the -th aggregation according to the preset aggregation strategy (see Subsection III-C). The loss value after the k-th global aggregation is . After global aggregation, the curator broadcasts the updated global model parameter back to the training node. Local model training and global model aggregation need to be repeated until the global loss function converges or reaches the preset accuracy.

Iii-C Trust-based Aggregation in DT-driven Industrial IoT

In federated learning in Industrial IoT, to improve the learning accuracy and resist malicious attacks, the parameters uploaded by nodes with high reputation should have greater weight in the aggregation. Unlike the traditional reputation model that only considers security threats, we comprehensively consider the effects of DT deviation, learning quality, and malicious data on learning.

Note that DTs have inevitable deviations in the mapping of node states in terms of CPU frequency , and the mapping deviations for different nodes are different. The parameters uploaded by nodes with low mapping deviation should occupy more weight in the aggregation. In addition, in a Byzantine attack, a malicious client user may provide the curator with incorrect or low-quality updates instead of an effective update. Therefore, we introduce learning quality and interaction records that count for malicious updates to weaken the threat of malicious data. Based on the subjective logic model, the belief of the curator for the node in the time slot can be expressed as


where indicates the DT deviation of the curator to the node , denotes quality of learning based on the honesty of most training devices, is the number of positive interactions, and is the number of malicious actions such as uploading lazy data. Specifically, the curator uses scheme that identifies unreliable nodes according to the gradient update diversity of local model updates in non-IID federated learning in that the training data of each node has a unique distribution [12]. The reputation value of the curator for the node is expressed as


where is a coefficient indicating the degree of uncertainty affecting the reputation and represents the failure probability of packet transmission. In the global aggregation, the curator retrieves the updated reputation value and aggregates the local model of the participating nodes into a weighted local model, i.e.,


where represents the global parameter after the -th global aggregation, is the number of training device. Through such a trusted-based aggregation, the deviation of DT is considered and the security threats caused by malicious participants are effectively resisted, which can enhance the robustness of the framework while increasing the learning convergence rate.

Iii-D Energy Consumption Model in Federated Learning

In federated learning, the energy consumption of a training node is composed of the computational energy consumed by local training and the communication energy consumed by global aggregation. The computational energy consumed by training node to perform one training are expressed as


where is defined as the CPU frequency required for one training and is the normalization factor of the consumed computing resources. To formulate the communication energy consumed, we use to represent the number of bits of the neural network model. In order to transmit the gradient to the curator, all training nodes share uplink sub-channels based on orthogonal frequency division multiple access (OFDMA), which is expressed as a set . Therefore, there is no co-channel interference between the training nodes. After collecting gradients from all training nodes, the curator broadcasts the global average gradient to the training nodes on all sub-channels. The communication resources consumed by training node i to perform an aggregation are expressed as


where represents the time fraction allocated by the training node on sub-channel , denotes the sub-channel bandwidth, is the uplink transmission power of the training node on sub-channel , is the uplink channel power gain between the training node and the curator, is the noise power and is the normalization factor of the consumed communication resources.

Iii-E Problem Formulation

The objective of this paper is to determine the best tradeoff between local update and global parameter aggregation in a time-varying communication environment with a given resource budget to minimize the loss function. The aggregation frequency problem can be formulated as


where represents the global parameter after the -th global aggregation, is the loss value after the -th global aggregation, is a set of strategies for the frequency of local updates, indicates the number of local updates required for the -th global update. Constraint (9a) represents the given budget on available resources, and represents the upper limit of the resource consumption rate in the entire learning process. In Eqn. (9) and Eqn. (9a), the loss value and the computational energy consumption include the training state and computational capability , respectively, which is estimated by DTs to enable the curator to grasp the key status of the entire federated learning. The deviation in computational energy consumption due to the mapping deviation of DT in the computational capability of the node is calibrated by trust-based aggregation.

Iv DEEP REINFORCEMENT LEARNING FOR Aggregation frequency based on Digital Twin

Iv-a Problem Simplification

Due to the long-term resource budget constraints, it is difficult to solve P1. If the energy consumption at the current aggregation is too high, it will lead to energy shortage in the future. In addition, P1 is a nonlinear programming problem, and the complexity of solving it increases exponentially with the increase of federated learning rounds. Therefore, it is necessary to simplify P1 and the long-term resource budget constraints.

The effect of training after global aggregations can be written as


For optimal training results, i.e.,


Based on the Lyapunov optimization, a dynamic resource deficit queue is established to simplify P1 by dividing the long-term resource budget into available resource budget for each time slot. Lyapunov optimization is widely used in control theory to ensure system stability in different forms. Generally, the Lyapunov function, which measures the multi-dimensional state of the system, is defined as becoming larger when the system moves to a bad state, and system stability can be achieved by taking a control action to shift it toward the negative zero direction. The first step of Lyapunov optimization is to construct virtual queues to meet the constraints of the problem to be solved, and then define the Lyapunov function to describe the square of the backlog of all virtual queues in time slot t. Next, define the difference of the Lyapunov function between the two time slots, i.e., the Lyapunov drift. Finally, minimize the Lyapunov drift of each slot, and always push the backlog to a lower congestion state to intuitively maintain the stability of the queue. The length of the resource deficit queue is defined as the difference between the resources used and the resources available. The limit of the total resource is , and the resource available in the -th aggregation is . The evolution of the resource deficit queue is as follows


where is the deviation of resources in the -th aggregation. According to the above Eqn. (8) and Eqn. (11), the original problem P1 can be transformed into


where and are positive control parameters that dynamically balance training performance and resource consumption. It is noted that the accuracy of federated learning can be easily improved at the beginning of the training, while it is costly to improve the accuracy at the later stage. Therefore, increases with the increase of training rounds and is motivated towards the goal of maximizing the ultimate benefits.

Iv-B Markov Decision Process (MDP) Model

We use deep reinforcement learning (DRL) to solve the frequency problem of local updates, DT learns models by interacting with the environment, without requiring pre-training data and model assumptions. We formulate the optimization problem as a MDP denoted by , which includes the system state , action space , policy , reward function and next state . The detailed description of parameters is as follows:

  • System State The system state describes the characteristics and training state of each node, including the current training state of all nodes , the current state of the resource deficit queue , and the average value output from the hidden layer of the neural network of each node , i.e.,

  • Action Space The action is defined by a vector where indicates the number of local updates, which need to be discretized. For simplicity, we use to denote because our subsequent statements are based on a specific time .

  • Policy Policy is the mapping of state space to action space, i.e., , to verify whether the local model update provided by the node is trusted.

  • Reward Function It is noted that our goal is to determine the best tradeoff between local update and global parameter aggregation to minimize the loss function, the reward function is naturally related to the decline of the overall loss function and the state of the resource loss queue. The action is assessed by

  • Next State The current state is provided by the DT real-time mapping, and the next state is the prediction obtained by the DT running DQN without actually running in the physical world, which can be expressed as .

Iv-C DQN-based Optimization Algorithm for Aggregation Frequency

Input: eval_net , target_net , update frequency of target_net parameters, greed coefficient , greed coefficient growth rate ;
Output: The parameters of the trained DQNs;
1 Randomly initialize the parameters of evaluation eval_net and target_net ;
2 for each episode do
3       Initialize the parameters in environment setup ;
4       for each time slot t do
5             select with probability and select a random action with probability ;
6             Perform federated learning training;
7             Calculate immediate reward with Eqn. (15) and update the system state ;
8             Store the experience tuples ;
9             if the experience relay is full then
10                   if  then
11                         Update the target_net parameters
12                   end if
13                  Learning all samples from the experience relay;
14                   Calculate q-eval value by eval_net;
15                   Calculate q-target value according to Eqn. (17);
16                   Perform a gradient descent step according to Eqn. (18);
18             end if
20       end for
22 end for
return The parameters of the trained DQNs.
Algorithm 1 Adaptive calibration of the global aggregation frequency.

To solve the MDP problem, we use DQN-based optimization algorithm. As shown in Fig. 1, DTs map all aspects of the physical objects of the Industrial IoT environment to the virtual space in real time, forming a digital mirror image. At the same time, the DRL agent interacts with the DTs of the devices to learn the global aggregation frequency decision. The Federated learning module makes frequency decisions based on the trained model and the DT status of the training nodes. Through DTs, the agent achieves the same training effect as the real environment at a lower cost.

Training Step

When using DQN to achieve adaptive calibration of the global aggregation frequency, we first assign initial training samples to the training nodes and set initial parameters for the target_net and the eval_net to maintain their consistency. The state array consists of the initial resource value and the corresponding loss value obtained by training each node. In each iteration, the state of the experience relay needs to be judged. If it is full, the action is selected with probability according to the greedy strategy, otherwise the action is randomly selected with probability .

After the action is selected, the reward is calculated according to Eqn. (15) and the system status is updated. Next, the current state, selected actions, rewards, and the next state are recorded in the experience relay. Then we sample from the experience relay to train target_net, which randomly disrupts the correlation between the states by randomly sampling several samples in the experience replay as a batch. By extracting the state, the eval_net parameters are updated according to the loss function as follows:


where represents the output of the current network eval_net, and is the q-target value calculated according to the parameters of the target_net, which is independent of the parameters in the current network structure. The target_net is used to calculate the q-target value according to the following formula:


where is the sample from the experience relay, represents the output of target_net. In this way, the entire objective function can be optimized by the stochastic gradient descent method:


After a certain number of iterations, eval_net parameters need to be copied to target_net. Namely, the updates of loss and target_net are performed in time intervals and experience replay is updated in real time. Repeat the above steps until the loss value reaches the preset value. The complete frequency algorithm for global update of a single cluster is presented in Algorithm 1.

Running Step

After the training, the proposed proposed aggregation frequency decision agent is deployed on the curators to make the optimal decision according to the users’ DTs. In addition, during the running process, the user’s state conversion data is collected and put into the training pool for the agent’s retraining. In this way, changes in the network topology and digital twin status can be quickly captured, and a new model can be quickly established based on the training results accumulated in the previous period.

Iv-D DQN-based Asynchronous Federated Learning

Input: The trained DQNs;
Output: Local aggregation parameters;
1 For each cluster, run the DQN trained by Algorithm 1 on the curator;
2 for  each global aggregation do
3       for  aggregation frequency given by DQN do
4             if   then
6             end if
7            Perform local trainings;
9       end for
11 end for
Algorithm 2 Intra-cluster aggregation frequency decision.

In an Industrial IoT scenario where devices are highly heterogeneous in terms of available data sizes and computing resources, single-round training speed is likely to be limited by the slowest nodes, i.e., the straggle effect, and synchronous training is not applicable. Therefore, we propose an asynchronous federated learning framework. The basic idea is to classify nodes with different computing power by clustering and configure corresponding curators for each cluster to enable each cluster to train autonomously with different local aggregation frequency. With a cluster, the aggregation frequency is obtained through the DQN-based adaptive frequency calibration algorithm mentioned in Section IV-C. The specific asynchronous federated learning process is as follows.

Step 1: Node clustering. We first use clustering algorithm K-means to classify nodes according to data size and computing power [12] and assign the corresponding curators to form a local training cluster. In this way, the execution time of each node in the same cluster of the local model is similar and does not drag each other.

Step 2: Aggregation frequency decision. Each cluster runs Algorithm 2 separately to obtain the corresponding global aggregation frequency. To match the frequency with the computing power of the node, we use the minimum time required for local update in the current round as a benchmark and specify that the local training time of other clusters cannot exceed where is the tolerance factor between and . The tolerance factor increases as the round of global aggregation increases, and the intuition behind this approach is that the effect of global aggregation on learning efficiency decays as the round increases.

Step 3: Local aggregation. After completing the local training according to the frequency given by DQN, the curators of each cluster use the trust weighted aggregation strategy to locally aggregate the parameters uploaded by the nodes. Specifically, the curator needs to retrieve the updated credit value and measure the importance of different node upload parameters according to Eqn. (7). Parameters uploaded by nodes with low mapping deviation and high learning quality take up more weight in local aggregation, which improves the accuracy and convergence efficiency of the model.

Step 4: Global aggregation. Finally, the time-weighted aggregation is used to aggregate global parameters. In order to distinguish the contribution of each local model to the aggregation operation based on time effect and improve the effectiveness of the aggregation operation, once the time for global aggregation arrives, the curators upload the parameter with the time version information, and the selected curator performs the global aggregation as


where is the number of curator, is the aggregated parameter of cluster , is the natural logarithm used to describe the time effect, and is the timestamp corresponding to the latest parameter of , i.e., the number of rounds.

Through the proposed heterogeneous framework equipped with trust mechanism, the straggler effect is eliminated and malicious node attacks are effectively evaded, naturally improving the convergence rate and learning quality.

V Numerical Results

We assume that devices in the Industrial IoT need to recognize each other and cooperatively perform production tasks based on federated learning such as defective product detection. Based on the publicly available large image data set MNIST, we apply the proposed scheme to real object classification tasks. We implement asynchronous federated learning and DQN in PyTorch. DQN is initialized by two identical neural networks, where the size of each network is consisting of three fully connected layers deployed in sequence. To illustrate the performance of the proposed scheme, we choose the fixed aggregation frequency scheme as the benchmark scheme.

Fig. 2: The convergence performance of DQN.
Fig. 3: Comparison of DT deviation and without DT deviation.
Fig. 4: The change of the average aggregation probability in a good channel state varies with the channel state.
Fig. 5: Comparison of energy consumed by federated learning in different channel states.
Fig. 6: Comparison of the accuracy achieved by federated learning under different clustering conditions.
Fig. 7: Comparison of the time consumed by federated learning under different clustering conditions.
Fig. 8: Comparison of the accuracy achieved of federated learning between adaptive frequency and fixed frequency.

Reinforcement learning is based on reward hypothesis that all goals can be described as maximizing the expected cumulative reward. The reward function is set to guide agent activities, and poorly designed rewards generally cause the network to not converge or the agent simply does not experience what it wants to learn. Fig. 2 depicts the trend of the loss value, from which we can see that agent’s loss has stabilized after about 1200 rounds of training and converged to a better result. Therefore, the trained DQN is suitable for heterogeneous scenarios and has good convergence performance.

Fig. 3 compares the federated learning accuracy that can be achieved in the presence of DT deviation and after calibrating DT deviation. The federated learning with DT deviation calibrated by the trust weighted aggregation strategy can achieve higher accuracy than the federated learning with the DT deviation, and the federated learning after calibration deviation is also superior when these two algorithms have not converged. In addition, we can observe that DQN with DT deviation cannot converge.

Fig. 4 shows the total number of aggregations required to complete federated learning and the number of aggregations in a good channel state as the channel state changes. As the distribution of good channel states increases, the number of aggregations in good channel states increases. It is noted that almost all aggregations are completed within 5 rounds due to the fact that DQN learning finds that the benefits of less aggregation times are greater. It shows that through continuous learning, DQN intelligently avoids performing aggregation in poor channel conditions.

Fig. 5 compares the energy consumed by federated learning during DQN training in different channel states, in which the energy consumption includes computing resources during local training and communication resources during aggregation. It can be seen that energy consumption decreases with the improvement of channel quality, which is mainly due to the fact that aggregation consumes more communication resources when channel quality is poor. With the training of DQN, the energy consumption in all three channel states decreases. This is because DQN can adaptively calibrate aggregation timing so that federated learning selects local training instead of long-delay and high-energy-consuming aggregation when the channel quality is relatively poor.

Fig. 6 describes the variation of the accuracy achieved by federated learning in different clustering cases. It can be seen that the more clusters, the higher accuracy the training can achieve in the same time, which is because the clustering effectively uses the computing power of heterogeneous nodes through different local aggregation times. Fig. 7 depicts the time required for federated learning to achieve preset accuracy under different clustering situations. The training time required to achieve the same accuracy decreases as the number of clusters increases. Similar to Fig. 6, this benefits from clustering effectively utilizing the computational power of heterogeneous nodes to make the local aggregation timing of different clusters different. With the increase of the number of clusters, the straggler effect can be alleviated more effectively, which naturally shortens the time required for federated learning. In addition, when the preset accuracy reaches or higher, the same accuracy improvement takes more time.

Fig. 8 compares the accuracy by DQN-based federated learning with that of fixed frequency federated learning. We can find from the training process that learns and then surpasses the accuracy value of fixed frequency. This is because the gain of the global aggregation to the federated learning accuracy is non-linear and the fixed frequency scheme may miss the best opportunity for aggregation. The accuracy of federated learning ultimately achieved by the proposed scheme is greater than that of the fixed frequency scheme, which is consistent with the goal of DQN to maximize the final gain.

Vi Conclusions

In this paper, we have leveraged DQN to explore the best trade-off between local and global updates with a given resource budget. Thanks to the DT that can sensitively capture the dynamic changes of the network, the proposed scheme can adaptively adjust the aggregation frequency according to the channel state. Furthermore, an asynchronous federated learning architecture based on node clustering has been designed to eliminate the straggler effect, which is more suitable for heterogeneous scenarios. The numerical results show that the proposed scheme is superior to the benchmark scheme in terms of learning accuracy, convergence rate and energy saving.


  1. thanks:


  1. S. K. Khaitan and J. D. McCalley, “Design Techniques and Applications of Cyberphysical Systems: A Survey,” in IEEE Systems Journal, vol. 9, no. 2, pp. 350-365, June 2015.
  2. F. Tao, Q. Qi and L. Wang, “Digital Twins and Cyber–Physical Systems toward Smart Manufacturing and Industry 4.0: Correlation and Comparison”, engineer, vol. 5, no. 4, pp. 595-812, 2019.
  3. A. Rasheed, O. San and T. Kvamsdal, “Digital Twin: Values, Challenges and Enablers From a Modeling Perspective,” in IEEE Access, vol. 8, pp. 21980-22012, 2020.
  4. Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen and H. Yu, “Federated Learning,” in Federated Learning, Morgan & Claypool, 2019.
  5. Z. Chen, P. Tian, W. Liao and W. Yu, “Zero Knowledge Clustering Based Adversarial Mitigation in Heterogeneous Federated Learning,” in IEEE Transactions on Network Science and Engineering, PP(99):1-1.
  6. Y. Lu, X. Huang, Y. Dai, S. Maharjan and Y. Zhang, “Blockchain and Federated Learning for Privacy-Preserved Data Sharing in Industrial IoT,” in IEEE Transactions on Industrial Informatics, vol. 16, no. 6, pp. 4177-4186, June 2020.
  7. K. Yang, T. Jiang, Y. Shi and Z. Ding, “Federated Learning via Over-the-Air Computation,” in IEEE Transactions on Wireless Communications, vol. 19, no. 3, pp. 2022-2035, March 2020.
  8. Y. Lu, X. Huang, Y. Dai, S. Maharjan and Y. Zhang, “Differentially Private Asynchronous Federated Learning for Mobile Edge Computing in Urban Informatics,” in IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 2134-2143, March 2020.
  9. Z. M. Fadlullah and N. Kato, “HCP: Heterogeneous Computing Platform for Federated Learning Based Collaborative Content Caching Towards 6G Networks,” in IEEE Transactions on Emerging Topics in Computing, PP(99):1-1.
  10. F. Ang, L. Chen, N. Zhao, Y. Chen, W. Wang and F. R. Yu, ”Robust Federated Learning With Noisy Communication,” in IEEE Transactions on Communications, vol. 68, no. 6, pp. 3452-3464, June 2020.
  11. S. Savazzi, M. Nicoli and V. Rampa, “Federated Learning With Cooperating Devices: A Consensus Approach for Massive IoT Networks,” in IEEE Internet of Things Journal, vol. 7, no. 5, pp. 4641-4654, May 2020.
  12. J. Choi and S. R. Pokhrel, “Federated Learning With Multichannel ALOHA,” in IEEE Wireless Communications Letters, vol. 9, no. 4, pp. 499-502, April 2020.
  13. T. Nishio and R. Yonetani, “Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge,” ICC 2019 - 2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019, pp. 1-7.
  14. S. R. Pandey, N. H. Tran, M. Bennis, Y. K. Tun, A. Manzoor and C. S. Hong, “A Crowdsourcing Framework for On-Device Federated Learning,” in IEEE Transactions on Wireless Communications, vol. 19, no. 5, pp. 3241-3256, May 2020.
  15. M. Hao, H. Li, X. Luo, G. Xu, H. Yang, and S. Liu, “Efficient and privacy-enhanced federated learning for industrial artificial intelligence,” IEEE Transactions on Industrial Informatics, PP(99):1-1.
  16. S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive Federated Learning in Resource Constrained Edge Computing Systems,” in IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1205-1221, June 2019.
  17. N. H. Tran, W. Bao, A. Zomaya, M. N. H. Nguyen and C. S. Hong, “Federated Learning over Wireless Networks: Optimization Model Design and Analysis,” IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, Paris, France, 2019, pp. 1387-1395.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description