Energy-Efficient Processing and Robust Wireless Cooperative Transmission for Edge Inference

Energy-Efficient Processing and Robust Wireless Cooperative Transmission for Edge Inference

Kai Yang,  Yuanming Shi,  Wei Yu,  and Zhi Ding,  K. Yang is with the School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China, also with the Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China, and also with the University of Chinese Academy of Sciences, Beijing 100049, China (e-mail: yangkai@shanghaitech.edu.cn).Y. Shi is with the School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China (e-mail: shiym@shanghaitech.edu.cn).Wei Yu is with the Electrical and Computer Engineering Department, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: weiyu@comm.utoronto.ca).Z. Ding is with the Department of Electrical and Computer Engineering, University of California at Davis, Davis, CA 95616 USA (e-mail: zding@ucdavis.edu).
Abstract

Edge machine learning can deliver low-latency and private artificial intelligent (AI) services for mobile devices by leveraging computation and storage resources at the network edge. This paper presents an energy-efficient edge processing framework to execute deep learning inference tasks at the edge computing nodes whose wireless connections to mobile devices are prone to channel uncertainties. Aimed at minimizing the sum of computation and transmission power consumption with probabilistic quality-of-service (QoS) constraints, we formulate a joint inference tasking and downlink beamforming problem that is characterized by a group sparse objective function. We provide a statistical learning based robust optimization approach to approximate the highly intractable probabilistic-QoS constraints by nonconvex quadratic constraints, which are further reformulated as matrix inequalities with a rank-one constraint via matrix lifting. We design a reweighted power minimization approach by iteratively reweighted minimization with difference-of-convex-functions (DC) regularization and updating weights, where the reweighted approach is adopted for enhancing group sparsity whereas the DC regularization is designed for inducing rank-one solutions. Numerical results demonstrate that the proposed approach outperforms other state-of-the-art approaches.

edge machine learning, energy efficiency, robust communication, group sparse beamforming, robust optimization, difference-of-convex-functions

I Introduction

Machine learning has transformed many aspects of our daily lives by taking advantage of abundant data and computing power in the cloud center. In particular, the strong capability of capturing the representations of data for detection or classification using deep neural networks [1] has made impressive gains in face recognition, natural language processing tasks, etc. With the explosion of mobile data and the increasing edge computing capability, there is an emerging trend of edge machine learning (edge-ML) [2]. Instead of uploading all data collected by mobile devices to the remote cloud data center, edge-ML emphasizes the use of the computation and storage resources at network edges to provide low-latency and reliable artificial intelligent (AI) service for privacy/security sensitive devices, such as wearable devices, augmented reality, smart vehicles, and drones. However, since mobile devices are usually equipped with limited computation power, storage and energy [2], it is usually infeasible to deploy deep learning models, i.e., deep neural networks (DNNs), at resource-constrained mobile devices, and execute inference tasks locally. A promising solution is to enable processing at the mobile network access points to facilitate deep learning inference, which is termed as edge inference [3].

In this paper, we shall present the edge processing framework that the input (e.g., a piece of rough doodle) of each mobile user is uploaded to wireless access points (e.g., base stations) served as edge computing nodes, each task is performed with pre-trained deep learning model (e.g., Nvidia’s AI system GauGAN [4] for turning rough doodles into photorealistic landscapes) at multiple edge computing nodes, and the output results (e.g., landscape images) are transmitted to mobile users via coordinated beamforming among multiple access points. In such a system, the provisioning of wireless transmissions in both the uplink and the downlink are important design considerations. In addition to the low-latency, improving the energy efficiency [5] is also critical due to the high computational complexity of processing DNNs, for which a number of works focusing on model compression methods [6, 7].

The joint chance constraints make the formulated probabilistic group sparse beamforming problem highly intractable since it has no closed-form expression generally. To address the chance-constrained programs, a number of works focus on finding computationally tractable approximations based on the collected samples of the random variables. A recognized scenario generation (SG) approach [15] is proposed to use a collection of sampled constraints to approximate the original chance constraints. However, SG is over-conservative since the volume of feasible region decreases by increasing the sample size, which leads to the deterioration of its performance. In addition, given the pre-specified probability and the confidence level for the probabilistic-QoS constraints, the required samples size of SG should satisfy , which increases roughly linearly with . In [13], a stochastic optimization approach is provided to address the over-conservativeness of SG. However, its computational cost grows linearly with the sample size, which is not scalable for obtaining high-robustness solutions. Moreover, its statistical guarantee under finite sample size is still not available. To address the limitations of existing methods, in this paper, we present a robust optimization approximation for the joint chance constraints by enforcing the QoS constraints for any element within a high probability region. The high probability region is further determined by adopting a statistical learning [16] approach. This approach enjoys the benefits that the minimum required sample size is only , and the computational cost is independent of the sample size.

With the statistical learning based robust optimization approximation approach, the resulting robust group sparse beamforming problem has nonconvex quadratic constraints and a nonconvex group sparse objective function. We find that the nonconvex quadratic constraints can be convexified by matrix lifting and semidefinite relaxation (SDR) [17]. Specifically, the nonconvex quadratic robust QoS constraints can be lifted as convex constraints in terms of a rank-one positive semidefinite matrix variable, which is then convexified by simply dropping the rank-one constraint. However, the SDR approach cannot guarantee that the obtained solution is feasible with respect to the original nonconvex quadratic constraints. The mixed -norm [18] is a well-known convex group sparsity inducing norm, which has been successfully applied in green cloud radio access networks [9] and cooperative wireless cellular network [19]. However, the SDR approach requires a quadratic form of the objective function, which makes the mixed -norm minimization approach inapplicable. To overcome this problem, a quadratic variational form of weighted mixed -norm is proposed in [20] to induce group sparsity. Note that [20] also considers a group sparse beamforming problem with nonconvex quadratic constraints. However, the performance of a quadratic variational form of weighted mixed -norm minimization with SDR is still not satisfactory.

To address the limitations of existing approaches, we propose a reweighted power minimization approach to enhance the group sparsity as well as improve the feasibility of nonconvex quadratic constraints. Specifically, we first adopt the iteratively reweighted minimization approach for enhancing group sparsity [21, 22]. To further guarantee the feasibility of the original nonconvex quadratic constraints, we exploit the matrix lifting technique to recast the nonconvex quadratic constraints as the convex constraints with respect to a rank-one positive semidefinite matrix, and propose a novel difference-of-convex-functions (DC) regularzation approach to induce rank-one solutions. Numerical results demonstrate that the proposed approach improves the probability of feasibility by avoiding the over-conservativeness of SG. Benefiting from both the reweighted minimization and the DC regularization, the proposed approach achieves a much lower total power consumption than the algorithm proposed in [20] and has a better capability of inducing group sparsity with nonconvex quadratic constraints.

I-a Contributions

In this work, we consider an edge computing system to execute deep learning inference tasks for resource-constrained mobile devices. In order to provide energy-efficient processing and robust wireless cooperative transmission service for edge inference, we propose to jointly design the downlink beamforming vector and the set of inference tasks performed at each edge computing nodes under probabilistic-QoS constraints. We provide a statistical learning based robust optimization approximation for the highly intractable joint chance constraints, which guarantees that the probabilistic-QoS constraints are feasible with certain confidence level. The resulting problem turns out to be a group sparse beamforming problem with nonconvex quadratic constraints. We propose a reweighted power minimization approach based on the principles of iteratively reweighted minimization for group sparsity inducing, matrix lifting technique, and a novel DC representation for rank-one positive semidefinite matrices. The proposed approach can enhance group sparsity and induce rank-one solutions.

We summarize the major contributions of this paper as follows:

1. We propose an energy-efficient processing and robust transmission approach for executing deep learning inference tasks at possibly multiple edge computing enabled wireless access points. The selection of optimal set of access points for each task is formulated as a group sparse beamforming problem with joint chance constraints.

2. We provide a robust optimization counterpart to approximate the joint chance constraints followed by a statistical learning approach to learn the parameters from data samples of the random channel coefficients. It turns out a nonconvex group sparse beamforming problem with nonconvex quadratic constraints.

3. We show that the nonconvex quadratic constraints can be reformulated as convex constraints with a rank-one constraint, where the rank-one constraint can be reformulated with a novel DC representation. To enhance the group sparsity and inducing rank-one solutions, we propose a reweighted power minimization approach by iteratively reweighted minimization with DC regularization and updating weights.

4. We conduct extensive numerical experiments to demonstrate the advantages of the proposed approach in providing energy-efficient and robust transmission service for edge inference.

I-B Organization and Notations

The rest of this work is organized as follows. In Section II, we introduce the system model and the power consumption model of edge inference, and formulate the energy-efficient processing and robust cooperative transmission problem as a group sparse beamforming problem with joint chance constraints. Section II provides a statistical learning based robust optimization approach to approximate the joint chance constraints. In Section IV, we design a reweighted power minimization approach for solving the robust group sparse beamforming problem. The simulation results are illustrated in Section V to demonstrate the superiority of the proposed approach over other state-of-the-art approaches. Finally, we conclude this work in Section VI.

Throughout this paper, we use lower-case bold letters (e.g., ) to denote column vectors and letters with one subscript to denote their subvectors (e.g., ). We further use lower-case bold letters with two subscripts to denote the subvectors of subvectors (e.g., is a subvector of ). We denote scalars with lower-case letters, matrices with capital letters (e.g., ) and sets with calligraphic letters (e.g., ). The conjugate transpose of a vector or matrix, -norm of a vector and spectral norm of a matrix are denoted as and , respectively.

Ii System Model and Problem Formulation

This section provides the system model and power consumption model of edge inference for deep neural networks, followed by the proposal of the energy-efficient edge processing under probabilistic-QoS constraints.

Ii-a System Model

Let be the requested output for MU , be the encoded scalar to be transmitted, and be the beamforming vector for message at the -th AP. We consider the downlink communication scenario, where all inputs ’s have already been collected at APs. Then the received signal at MU is given by

 yl=N∑n=1K∑k=1hHnlvnksk+zl, (1)

where is the channel coefficient vector between the -th AP and the -th MU, is the additive isotropic white Gaussian noise. Suppose all data symbols are mutually independent with unit power, i.e., , and also independent with the noise. Denote as the set . Let denote a feasible allocation for the inference tasks on APs, i.e., computational task shall be performed at the -th AP for . In term of the group sparsity structure of the aggregative beamforming vector

 v=[vH11,⋯,vHN1,⋯,vHNK]H∈CNKL, (2)

we have that if the inference task will not be performed at AP , i.e., , the beamforming vector will be set as zero. Let be the group sparsity pattern of given as

 T(v)={(n,k)|vnk≠0}. (3)

The signal-to-interference-plus-noise-ratio (SINR) for mobile device is given by

 SINRk(v;hk)=|hHkvk|2∑l≠k|hHkvl|2+σ2k, (4)

where and are given by

 hk =[hH1k,⋯,hHNk]H∈CNL, (5) vk =[vH1k⋯vHNk]H∈CNL, (6)

and the aggregative channel coefficient vector is denoted as

 h=[hH1,⋯,hHK]H∈CNKL. (7)

The transmit power constraint at the -th AP is given by

 E[K∑l=1∥vnlsl∥22]=K∑l=1∥vnl∥22≤PTxn,n∈[N], (8)

where is the maximum transmit power.

Ii-B Power Consumption Model

Although widespread applications of deep learning bring numerous opportunities in intelligent systems, energy consumption becomes one of the main concerns due to the required intensive computation operations[3]. Let the power consumption of computing task at the -th edge computing node be . The total computation power consumption for all edge computing nodes is thus given by

 Pc=∑n,kPcnkI(n,k)∈T(v), (9)

where the indicator function is if and otherwise. Therefore, the total power consumption consists of transmission power consumption for output results delivery and computation power consumption for deep learning tasks execution, which is given by

 P=∑n,k1ηn∥vnk∥22+∑n,kPcnkI(n,k)∈T(v), (10)

where is the power amplifier efficiency.

Deep neural networks especially deep convolutional neural networks (CNNs) becomes an indispensable and the state-of-the-art paradigm for real-world intelligent services. Its high energy cost has attracted much interest in designing energy-efficient structures of neural networks [6]. Estimating the energy consumption of a neural network is thus critical for inference at the edge, for which an estimation tool is developed in [23]. The energy consumption of performing an inference task consists of the computation part and the data movement part [8]. The computation energy consumption can be calculated by counting the number of multiply-and-accumulate (MACs) in the layer and weighing it with the energy consumption of each MAC operation in the computation core. The energy consumption of data movement is calculated by counting the number of accessing memory at each level of the memory hierarchy in the corresponding hardware and weighing it with the energy consumption of accessing the memory in the corresponding level.

Here we illustrate how to estimate the computation power consumption of performing image classification tasks using the classic CNN (i.e., AlexNet consisting of convolutional layers and fully-connected layers) on the Eyeriss chip. The energy estimation tool takes network configuration as input and outputs the estimated energy breakdown of each layer in terms of computation part and the data movement part of three data types(weight, input feature map, output feature map). Figure 2 demonstrates the estimated energy of each layer running on Eyeriss chip, and the overall energy consumption is the sum of four parts. The unit of energy is normalized by the energy for one MAC. Based on the total energy consumption, the computation power consumption can be further determined via dividing the energy consumption by the computation time.

Ii-C Channel Uncertainty Model

For high-stake intelligent applications such as autonomous driving and automation, robustness is a critical requirement. In practice, inevitably there is uncertainty in the available channel state information (CSI) , which is taken into consideration to provide robust transmission in this paper. It may originate from training based channel estimation [11], limited precision of feedback [12], partial CSI acquisition [13] and delays in CSI acquicition [14]. In this work, we adopt the additive error model [25, 26] of the channel imperfection, i.e.,

 h=^h+e, (11)

where is the estimated aggregative channel vector and is the random errors of the CSI with unknown distribution and expectation as . We apply the probabilistic quality-of-service (QoS) constraints [13] to characterize the robustness of delivering the inference results to MUs

 Pr(SINRk(v;hk)≥γk)≥1−ζ,∀k∈[K]. (12)

Here is the tolerance level and is called safe condition.

Ii-D Problem Formulation

In the proposed edge processing framework for deep learning inference tasks, there is a fundamental tradeoff between computation and communication. Specifically, having more inference tasks executed at edge nodes will yield higher computation power consumption, while the downlink transmission power consumption shall be reduced due to the cooperative transmission gains. In this paper, we propose an energy-efficient processing and robust transmission approach to minimize the total network power consumption, while satisfying the probabilistic QoS constraints and transmit power constraints. It is formulated as the following probabilistic group sparse beamforming problem:

 PCCP:% minv∈CNKL ∑n,k1ηn∥vnk∥22+∑n,kPcnkI(n,k)∈T(v) s.t. Pr(SINRk(v;hk)≥γk)≥1−ζ,k∈[K] (14) K∑k=1∥vnk∥22≤PTxn,n∈[N].

To achieve the robustness of QoS against CSI errors, we shall collect i.i.d. (independent and identically distributed) samples of the imperfect channel state information as the data set to learn the uncertainty model of CSI before providing edge inference service. Based on the data set , we aim to design a beamforming vector such that the safe condition is satisfied with probability at least . However, since we do not know the prior distribution of random errors, the statistical guarantee of a given approach is usually expressed as certain confidence level for certain tolerance level , e.g., the scenario generation approach [15]. That is, the confidence level of

 Pr(SINRk(v;hk)≥γk)≥1−ϵ (15)

is no less than for some , , and . Thus the violation probability of the safe condition is upper bounded by

 Pr(SINRk(v;hk)<γk)<δ+ϵ(1−δ). (16)

By setting and such that , the safe condition (12) is guaranteed to be met.

We consider the block fading channel where the channel distribution is assumed invariant [27] within blocks and the channel coefficient vector remains unchanged within each block. Note that the training by collecting channel samples within each block will result in high signaling overhead. We will show that our proposed approach for addressing the probabilistic-QoS constraints can be intergrated with a cost-effective channel sampling strategy in Section III-D.

Ii-E Problem Analysis

Directly solving the joint chance constraints (14) is usually a highly-intractable task [15], especially when there is no exact knowledge about the uncertainty. In this work, we shall propose a general framework for edge inference without assuming the prior distribution of random errors. A natural idea is to find a computationally tractable approximation for the probabilistic QoS constraints (14).

Ii-E1 Scenario Generation

Scenario generation [15] is a well-known approach by obtaining independent samples of the random channel coefficient vector and imposing the target QoS constraints for each sample. However, it becomes more conservative when increasing the number of samples since the volumn of feasible region will decrease, which may result in the infeasibility of problem . In addition, the sample size should be chosen such that , where gives the confidence level for the probabilistic-QoS constraints defined in equation (12). Therefore, the scenario generation approach has scalability issue since the required minimum sample size increases roughly linearly with for small and also with .

Ii-E2 Stochastic Programming

To address this over-conservativeness issue of the scenario generation approach, a stochastic programming approach is further provided in [13] by finding a difference-of-convex-functions (DC) approximation for the chance constraints. The resulting DC constrained stochastic program can be solved by successive convex approximation with the Monte Carlo approach at per iteration. However, its computation cost grows linearly with the number of samples which is not scalable for obtaining high-robustness solutions, and the statistical guarantee is not available for the joint chance constraints under finite sample size.

To address the limitations of the existing works, we shall present a robust optimization approach in Section III to approximate the chance constraint via a statistical learning approach[16]. This approach enjoys the main advantages that the minimum required number of observations is only and the computational cost is independent of the sample size.

Iii Learning-Based Robust Optimization Approximation for Joint Chance Constraints

In this section, we provide a robust optimization approximation for the joint chance constraints in problem , followed by a statistical learning approach to learn the shape and size of the high probability region.

Iii-a Approximating Joint Chance Constraints via Robust Optimization

Robust optimization [16] uses safe approximation and imposes that the safe conditions are always satisfied when the random variables lie in some geometric set. Specifically, the robust optimization approximation of the joint chance constraints (14) is given by

 (17)

where is the high probability region that lies in. The robust optimization approximation for the joint chance constraints should yield a solution such that the probabilistic QoS constraint is satisfied with high confidence. The robust optimization approximation approach is realized by constructing a high probability region from the data set such that covers a content of , i.e.,

 Pr(hk∈Uk)≥1−ϵ, (18)

with confidence level at least . By imposing the QoS constraints for element in the high probability region as presented in equation (17), the confidence level for the probabilistic-QoS constraints (15) will be at least . We thus obtain the robust optimization approximation for problem as

 PRO:minimizev,h ∑n,k1ηn∥vnk∥22+∑n,lPcnkI(n,k)∈T(v) subject to SINRk(v;hk)≥γk,hk∈Uk,k∈[K] (19) K∑k=1∥vnk∥22≤PTxn,n∈[N].

For the computational tractability and motivated by channel estimation [20], we adopt the ellipsoidal uncertainty sets to model the uncertainty of each group of channel coefficient vector . The high probability region is parameterized as

 Uk={hk:hk=^hk+Bkuk,uHkuk≤1}. (20)

Here the parameters and shall be learned from the data set , which will be presented in Section III-B. We will then present the tractable reformulation of the robust optimization counterpart problem in Section III-C.

Iii-B Learning the High Probability Region from Data Samples

Note that (17) only gives a feasibility guarantee for the joint chance constraints with statistical confidence at least , but its conservativeness is still a challenging problem. Generally speaking, problem is a less conservative approximation for problem if it has a larger feasible region. Therefore, we prefer a smaller volume of the high probability region which provides a larger feasible region. We can further reduce the volume of the high probability region such that the statistical confidence for the probabilistic-QoS constraints is closer to instead of just larger than it.

In this paper, we propose to use a statistical learning approach [16] for the parameters of the high probability region , which consists of a shape learning procedure and a size calibration procedure via quantile estimation. First of all, we split the samples in data set into two parts, i.e., and , each for one procedure.

Iii-B1 Shape Learning

Each ellipsoid set can be re-parameterized as

 Uk={hk:(hk−^hk)TΣ−1k(hk−^hk)≤sk}, (21)

where and are shape parameters of the ellipsoid , determines its size, and . Suppose the observations of is given by . The shape parameter can be chosen as the sample mean, i.e.,

 ^hk =1D1D1∑j=1~h(j)k, (22)

To reduce the complexity of the ellipsoid, we omit the correlation between each and choose as the block diagnal matrix where each diagonal element is the sample covariance of the first part of the data set for , i.e.,

 Σk =⎡⎢ ⎢⎣Σk1⋱ΣkN⎤⎥ ⎥⎦, where Σkn =1D1−1D1∑j=1(~h(j)kn−^hkn)(~h(j)kn−^hnk)H. (23)

Iii-B2 Size Calibration via Quantile Estimation

We then use the second part of dataset for calibrating the ellipsoid size . The key idea is to estimate a quantile with confidence of a transformation of the data samples in . Let

 G(ξ)=(ξ−^hk)TΣ−1k(ξ−^hk) (24)

be the map from the random space that lies in to . The size parameter will be chosen as an estimated -quantile of the underlying distribution of based on the data samples in , where the -quantile is defined from

 Pr(G(ξ)≤q1−ϵ)=1−ϵ. (25)

Specifically, by computing the function values of on each sample of , we can obtain the observations where . Then the -th value of the ranked observations in ascending order, denoted as , can be an upper bound of the -quantile of the underlying distribution of based on the following proposition:

Proposition 1.

is an upper bound of the -quantile of the underlying distribution with confidence, i.e.,

 Pr(sk≥q1−ϵ)≥1−δ, (26)

if is set as

 sk=G(j⋆),  where j⋆ is % given by min1≤j≤D−D1{j:j−1∑k=0(D−D1k)(1−ϵ)kϵD−D1−k≥1−δ}. (27)
Proof.

According to the definition of the quantile , we have

 Pr(G(j)≥q1−ϵ) = Pr(G(k)

Therefore is the smallest one among all upper bounds of the -quantile of the underlying distribution with confidence. ∎

Using the presented two procedures, we learn a high probability region of the random channel coefficient vector ’s. The statistical guarantee of this statistical learning based robust optimization approximation approach is given by the following proposition:

Proposition 2.

Suppose the data samples in the data set are i.i.d. and chosen from a continuous distribution for any . The data set is split into two independent parts and . Each uncertainty set is chosen as . Their parameters and are determined following equation (22), equation (23), and equation (27), respectively. Thus, any feasible solution to problem guarantees that the probabilistic-QoS constraints (15) are satisfied with confidence at least .

Proof.

Since depends only on , we have

 PrD2k(v∈V)=%PrD2k(G(t⋆)≥q1−ϵ)≥1−δ. (29)

Therefore, it is readily obtained that satisfies with confidence at least . ∎

Note that exists only if

 D−D1−1∑k=0(D−D1k)(1−ϵ)kϵD−D1−k≥1−δ, (30)

which implies that . In other words, the required minimum number of samples is to achieve the confidence of the probabilistic QoS constraint (14). Matrix can be computed as

 Bk=√skΔk, (31)

where is the Cholesky decomposition of , i.e., . We summarize the whole procedure for learning the high probability region from data set in Algorithm 1.

Iii-C Tractable Reformulations for Robust Optimization Problem

According to the ellipsoidal uncertainty model (20), the robust optimization approximation (17) can be rewritten as

 hHk(1γkvkvHk−∑l≠kvlvHl)hk≥σ2k (32) hk=^hk+Bkuk,uHkuk≤1, (33)

where . By defining matrices

 Hk=[^hkBk]∈CNL×(NL+1) (34)

and using the S-procedure [28], we obtain the following equivalent tractable reformulation for (32) and (33):

 HHk(1γkvkvHk−∑l≠kvlvHl)Hk⪰Qk (35) λk≥0, (36)

where and is given by

 Qk=[λk+σ2k00−λkINL]∈C(NL+1)×(NL+1). (37)

The derivation details of (35) and (36) from (32) and (33) is relegated to Appendix A.

Thus the proposed robust optimization approximation for problem is given by the following group sparse beamforming problem with nonconvex quadratic constraints:

 PRGS:minimizev∈CNKL,λ∈RK ∑n,l1ηn∥vnl∥22+∑n,lPcnlI(n,l)∈T(v) subject to (???),λk≥0,∀k∈[K] (39) K∑l=1∥vnl∥22≤PTxn,∀n∈[N].

Iii-D Integrating the Robust Optimization Approximation with a Cost-Effective Sampling Strategy

Consider the block fading channel where the channel distribution is assumed invariant [27] within the coherence interval for channel statistics. The coherence interval for channel statistics consists of blocks, where each block is called a coherence interval for CSI and the channel coefficient vector remains unchanged within each block. However, collecting channel samples within each block leads to high signaling overhead. To address this issue, we provide a cost-effective sampling strategy for enabling robust transmission, whose timeline is illustrated in Fig. 3.

At the beginning of the coherence interval for channel statistics, we collect i.i.d. channel samples as . Based on the data set , we can learn the estimated channel coefficient vector from equation (22) and the estimated high probability region of the error as from equation (31). For the transmission in the first block, we can obtain by combining these two parts following equation (34) and solve the resulting problem . For any other block , we can obtain the estimated channel coefficient as the sample mean by collecting as few as one sample of the channel coefficient vector. By replacing the estimated channel coefficient and keeping the error information , we can construct the parameter at the -th block as

 Hk[t]=[^hk[t]Bk],∀k∈[K], (40)

and design the transmitter beamformer by solving problem , which significantly reduces the signaling overhead for channel sampling. The effectiveness of this cost-effective scheme will be demonstrated in Section V-A numerically.

Iv Reweighted Power Minimization for Group Sparse Beamforming with Nonconvex Quadratic Constraints

This section presents a reweighted power minimization approach to induce the group sparsity structure for problem . We further demonstrate that the nonconvex quadratic constraints can be reformulated as convex constraints with respect to a rank-one positive semidefinite matrix using a matrix lifting technique, followed by proposing a DC approach to induce rank-one solutions.

Iv-a Matrix Lifting for Nonconvex Quadratic Constraints

We observe that constraints (35) are convex with respect to despite of its nonconvexity with respect to . This motivates us to adopt the matrix lifting technique [17] to address the nonconvex quadratic constraints in problem by denoting

 Vij[s,t]=vsivHtj∈CL×L (41) Vij=⎡⎢ ⎢ ⎢⎣Vij[1,1]⋯Vij[1,N]⋮⋱⋮Vij[N,1]⋯Vij[N,N]⎤⎥ ⎥ ⎥⎦=vivHj∈CNL×NL (42) V=vvH=⎡⎢ ⎢⎣V11⋯V1K⋮⋱⋮VK1⋯VKK⎤⎥ ⎥⎦∈SNKL+, (43)

where denotes the set of Hermitian positive semidefinite (PSD) matrices. The aggregative beamforming vector is thus lifted as a rank-one PSD matrix . The constraint of problem , which given by (35), can be equivalently rewritten as the following PSD constraint

 HHk(1γkVkk−∑l≠kVll)Hk⪰Qk, (44)

and the transmit power constraint (39) can be equivalently rewritten as

 K∑l=1∥vnl∥22=K∑l=1Tr(Vll[n,n])≤PTxn,∀n=1,⋯,N. (45)

Therefore, using the matrix lifting technique, we obtain an equivalent reformulation for problem as

 P:minimizeV,λ ∑n,l(1ηnTr(Vll[n,n])+PcnlITr(Vll[n,n])≠0) subject to (???),λk≥0,∀k∈[K] (48) K∑l=1Tr(Vll[n,n])≤PTx% n,∀n∈[N] V⪰0,rank(V)=1.

Note that the constraints are still nonconvex due to the nonconvexity of the rank-one constraint.

Iv-B DC Representations for Rank-One Constraint

For a nonzero positive semidefinite matrix , its rank is one if and only if only the largest singular value is nonzero, i.e.,

 σi(V)=0,i=2,⋯,NKL, (49)

where is the -th largest singular value of . The trace norm and spectral norm of the positive semidefinite matrix are respectively given as

 Tr(V)=NKL∑i=1σi(V),∥V∥=σ1(V). (50)

Thus we obtain an equivalent DC representation for the rank-one constraint of :

 R(V)=Tr(V)−∥V∥=0. (51)

is a DC function of since both the trace norm and the spectral norm are convex.

Iv-C Reweighted ℓ1 Minimization for Inducing Group Sparsity

Reweighted minimization approach has shown its advantages in enhancing group sparsity for improving the energy-efficiency of cloud radio access networks [21, 22]. -norm is a well recognized convex surrogate for the -norm. In order to further enhance the sparsity, reweighted minimization is proposed to iteratively minimize a weighted -norm and update the weights. For the objective function of problem , we observe that the indicator function can be interpreted as the -norm of . We can thus use the reweighed minimization technique via approximating by , which consists of alternatively minimizing the approximated objective function and updating the weight as

 wnl=cTr(Vll[n,n])+τ, (52)

where is a constant regularization factor and is a constant. If is small, the reweighted minimization approach will put larger weight on the transceiver pair , which prompts that the inference task is not preferred to be executed at the -th edge node.

Iv-D Proposed Reweighted Power Minimization Approach

In this subsection, we provide a reweighted power minimization approach by combining the matrix lifting, DC representation and reweighted minimization techniques. In the -th step, we shall update via solving

 minimizeV,λ ∑n,l(1ηn+w[j]nlPcnl)Tr(Vll[n,n]) subject to (???),λk≥0,∀k∈[K] (53) K∑l=1Tr(Vll[n,n])≤PTx% n,∀n∈[N] V⪰0,rank(V)=1,

and the weights are updated following (52) which are initialized as at the beginning.

To solve problem (53) with nonconvex rank-one constraint, we propose to use the DC representation (51) by solving the following DC program

 PDC:minimizeV,λ ∑n,l(1ηn+w[j]nlPcnl)Tr(Vll[n,n])+μR(V) subject to (???),λk≥0,∀k∈[K] (54) K∑l=1Tr(Vll[n,n])≤PTx% n,∀n∈[N] V⪰0,

where is the regularization parameter. Despite of the nonconvexity of the DC problem, problem can be efficiently solved by the simplified DC algorithm, i.e., iteratively linearizing the concave part [29]. At the -th iteration, we shall solve

 minimizeV,λ ∑n,l(1ηn+w[j]nlPcnl)Tr(Vll[n,n]) +μ(Tr(V)−Tr(G(t)V)) subject to (???),λk≥0,∀k∈[K] (55) K∑l=1Tr(Vll[n,n])≤PTx% n,∀n∈[N] V⪰0,

where is one subgradient of spectral norm at . It can be computed as where is the eigenvector corresponding to the largest eigenvalue of matrix . This DC algorithm guarantees converging to a stationary point of problem from arbitrary initial points [29].

When the reweighted minimization algorithm converges at a rank-one solution , we can extract the aggregative beamforming vector from the Choleskey decomposition . The whole procedure of the proposed reweighted power minimization approach is summarized in Algorithm 2.

V Numerical Results

In this section, we provide numerical experiments for comparing the proposed framework with other state-of-the-art approaches. We generate the edge inference system with APs located at meters and mobile users randomly located in the meters square region. Each AP is equipped with antennas. The imperfection model of the channel coefficient vector between the -th AP and the -th mobile user is chosen as . The path loss model is given by , the Rayleigh small scale fading coefficient is given by , and the additive error is given by . Suppose that each AP collects independent samples of ’s and split the data set evenly for learning the shape and size of the uncertainty ellipsoids respectively. For each AP, the power amplifier efficiency is chosen as , the average maximum transmit power is chosen as , and the computation power consumption for each task at the -th AP is chosen as . We set the target SINR as , the tolerance level as , and the confidence level as . The regularization parameters is set as and is set as .

V-a Benefits of Taking CSI Uncertainty into Consideration

In this paper, we consider the CSI uncertainty in channel sampling and propose to solve it with a learning-based robust optimization approximation approach. To further reduce the channel sampling overhead, we provide a cost-effective sampling strategy in Secion III-D. We now evaluate its advantages over the beamformer design without taking the CSI error into consideration by supposing that each task is performed at all APs. Specifically, we collect i.i.d. channel samples in the training phase within one coherent interval for CSI. In the test phase, we only collect one channel sample , construct ’s following equation (40) and solve problem

 minimizeV,λ ∑n,l(1ηnTr(Vll[n,n])+Pcnl) subject to (???),λk≥0,∀k∈[K] (56) K∑l=1Tr(Vll[n,n])≤PTx% n,∀n∈[N] V⪰0.

As comparison, the beamforming design without taking uncertainty into consideration is given by solving problem

 minimizeV,λ ∑n,l(1ηnTr(Vll[n,n])+Pcnl) subject to h(1)kH(1γkVkk−∑l≠kVll)h(1)k≥σ2k, ∀k (57) K∑l=1Tr(Vll[n,n])≤PTx% n, ∀n, V⪰0.

Note that we use SDR for both approaches for fairness. We compare two approaches by generating realizations of i.i.d. channel samples for testing, and regenerate the training data set for the proposed approach every realizations. We compute the achieved SINR for each mobile device with the solution to each approach, i.e., where is the true channel coefficient vector, and calculate the number of realizations that the target QoS for each device is met, i.e., . The results shown in Table I demonstrate that the proposed robust approximation approach has considerably improved the robustness of QoS against CSI errors by a cost-effective sampling approach.

V-B Overcoming the Over-Conservativeness of Scenario Generation

As we point out in Section II-E, the scenario generation approach is over-conservative since it imposes that the target QoS constraints are satisfied for all samples, which would lead to a smaller feasible region. Here we use numerical experiments to demonstrate the advantage of the presented robust optimization approximation approach in overcoming the over-conservativeness. Consider the feasibility problem of the robust optimization approximation approach given by

 find V,λ subject to (???),λk≥0,∀k∈[K], (58) K∑l=1Tr(Vll[n,n])≤PTx% n,∀n∈[N], V⪰0,

and the feasibility problem of the scenario approach given by