Deep Learning-Aided Tabu Search Detection for Large MIMO Systems

# Deep Learning-Aided Tabu Search Detection for Large MIMO Systems

Nhan Thanh Nguyen and Kyungchun Lee,  N. T. Nguyen is with the Department of Electrical and Information Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea (e-mail: nhan.nguyen@seoultech.ac.kr).K. Lee is with the Department of Electrical and Information Engineering and the Research Center for Electrical and Information Technology, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea (e-mail: kclee@seoultech.ac.kr).
###### Abstract

In this study, we consider the application of deep learning (DL) to tabu search (TS) detection in large multiple-input multiple-output (MIMO) systems. First, we propose a deep neural network architecture for symbol detection, termed the fast-convergence sparsely connected detection network (FS-Net), which is obtained by optimizing the prior detection networks called DetNet and ScNet. Then, we propose the DL-aided TS algorithm, in which the initial solution is approximated by the proposed FS-Net. Furthermore, in this algorithm, an adaptive early termination algorithm and a modified searching process are performed based on the predicted approximation error, which is determined from the FS-Net-based initial solution, so that the optimal solution can be reached earlier. The simulation results show that the proposed algorithm achieves approximately complexity reduction for a MIMO system with QPSK with respect to the existing TS algorithms, while maintaining almost the same performance.

MIMO, deep learning, deep neural network, tabu search.

## I Introduction

In mobile communications, a large multiple-input multiple-output (MIMO) system is a potential technique to dramatically improve the system’s spectral and power efficiency [ngo2013energy], [marzetta2010noncooperative]. However, in order for the promised benefits of large MIMO systems to be reaped, significantly increased computational complexity requirements are presented at the receiver when compared to those of the conventional MIMO system [wu2014large, nguyen2019qr]. Therefore, low-complexity near-optimal detection is an important challenge in realizing large MIMO systems [rusek2013scaling, chockalingam2014large, mandloi2017low]. Two major lines of studies have been conducted recently to fulfill that challenge, including proposals of low-complexity near-optimal detection algorithms [vardhan2008low, mohammed2009high, qin2016near, mandloi2017error, som2010improved, mohammed2009low, datta2013novel, hansen2009near, mandloi2017layered, narasimhan2014channel, vsvavc2013soft, nguyen2019qr] and resorting to deep-learning (DL) techniques for symbol detection in massive MIMO systems [farsad2017detection, ye2017power, mohammadkarimi2018deep, samuel2019learning, samuel2017deep, gao2018sparsely].

### I-a Recent works

Various algorithms for large-MIMO detection have been introduced [vardhan2008low, mohammed2009high, qin2016near, mandloi2017error, som2010improved, mohammed2009low, datta2013novel, hansen2009near, mandloi2017layered, narasimhan2014channel, vsvavc2013soft, nguyen2019qr]. Among them, the tabu search (TS) detector is considered as a complexity-efficient scheme for symbol detection in large MIMO systems. It has been shown that the TS detection algorithm can perform very close to the maximum-likelihood (ML) bound with far lower complexity compared to sphere decoding (SD) and fixed-complexity SD (FSD) schemes in large MIMO systems [rusek2013scaling], [srinidhi2011layered]. In [srinidhi2009low], an approach based on reactive TS (RTS) is proposed for near-ML decoding of non-orthogonal space-time block codes (STBCs) with 4-QAM. However, its performance is far from optimal for higher-order QAMs, such as 16- and 64-QAM [srinidhi2009near]. The work in [srinidhi2011layered] proposes an algorithm called layered TS (LTS). This algorithm improves the performance of the TS detection in terms of the bit-error rate (BER) for higher-order QAM in large MIMO systems. However, to achieve a BER of in and MIMO systems with 16-QAM, higher complexities are required than in conventional TS. The random-restart reactive TS (R3TS) algorithm, which runs multiple RTS and chooses the best among the resulting solution vectors, is presented in [datta2010random]. It achieves improved BER performance at the expense of increased complexity. The complexity of R3TS is generally higher than that of RTS to achieve a BER of , especially for large antenna configurations and high-order QAMs, such as MIMO with 64-QAM. The work of [zhao2007tabu] has been conducted to further improve TS in terms of complexity, which is based on a reduced number of examined neighbors and an early-termination (ET) criterion. However, it comes at the cost of performance loss; for example, to achieve BER for MIMO with 16-QAM modulation, the TS algorithm with ET has a 3-dB signal-to-noise ratio (SNR) loss compared to the original TS [zhao2007tabu]. In [nguyen2019qr], the QR-decomposition-aided TS (QR-TS) algorithm is proposed for achieving considerable complexity reduction without any performance loss.

On the other hand, the application of deep learning (DL) to symbol detection in MIMO systems has recently gained much attention [farsad2017detection, ye2017power, samuel2019learning, samuel2017deep, gao2018sparsely, mohammadkarimi2018deep]. In [farsad2017detection], three detection algorithms based on deep neural networks (DNNs) are proposed for molecular communication systems, which are shown to perform much better than the prior simple detectors. In contrast, the application of DL to symbol detection in orthogonal frequency-division multiplexing (OFDM) systems are considered in [ye2017power]. Specifically, Ye et al. in [ye2017power] show that the detection scheme based on DL can address channel distortion and detect the transmitted symbols with performance comparable to that of the minimum mean-square error (MMSE) receiver. In [mohammadkarimi2018deep], the DL-based SD scheme is proposed. In particular, the DL-based SD with the radius of the decoding hypersphere learned by a DNN achieves significant complexity reduction with respect to the conventional SD with a marginal performance loss. In particular, the works of [samuel2019learning] and [samuel2017deep] focus on the design of DNNs for symbol detection in large MIMO systems. Specifically, Samuel et al. in [samuel2019learning] and [samuel2017deep] first investigate the fully connected DNN (FC-DNN) architecture for symbol detection and show that although it performs well for fixed channels, its BER performance is very poor for varying channels. To overcome this problem, a DNN that works for both fixed and varying channels, called the detection network (DetNet), is introduced [samuel2019learning, samuel2017deep]. However, the DetNet requires high computational complexity because of its complicated network architecture, motivating the proposal of the sparsely connected network (ScNet) in [gao2018sparsely] to improve performance and reduce complexity.

### I-B Contributions

Although TS detection is considered an efficient symbol-detection algorithm for large MIMO systems [nguyen2019qr, srinidhi2011layered], it requires many searching iterations to find the optimal solution, causing high computational complexity. The TS algorithm introduced in [zhao2007tabu] uses an ET criterion to terminate the iterative searching process early after a certain number of iterations when no better solution is found. Although this scheme provides complexity reduction, it can result in significant performance loss because the early terminated searching process does not guarantee the optimal solution. However, the number of searching iterations in the TS algorithm can be reduced with only marginal performance loss if a good initial solution and efficient searching/ET strategies are employed, which can be facilitated by DL. More specifically, we found that the initial solution obtained by a DNN is remarkably more reliable than the conventional linear zero-forcing (ZF)/MMSE and ordered successive interference-cancellation (OSIC) solutions. Furthermore, unlike in the cases of the ZF, MMSE, and OSIC receivers, the initial solution generated by an appropriate activation function in the DNN often has signals very close to or exactly the same as the constellation symbols, even before a quantization is applied. This property can be exploited to efficiently determine the reliable/unreliable detected symbols in the initial solution. Based on these aspects, the DL-aided TS algorithm is proposed for complexity reduction of the TS algorithm with ET. Our main contributions are summarized as follows:

• First, we further optimize the DetNet [samuel2019learning, samuel2017deep] and ScNet [gao2018sparsely] architectures to develop the fast-convergence sparsely connected detection network (FS-Net). Our simulation results show that the proposed FS-Net architecture achieves improved performance and reduced complexity with respect to DetNet and ScNet. As a result, the FS-Net-based solution is taken as the initial solution of the TS algorithm.

• In each iteration of the conventional TS algorithm, the move from the current candidate to its best neighbor is made, even when it does not result in a better solution. Therefore, it is possible that no better solution is found after a large number of iterations, but high complexity is required. This motivates us to improve the iterative searching phase of the TS algorithm. Specifically, by predicting the incorrect symbols in the FS-Net-based initial solution, more efficient moves can be made so that the optimal solution is more likely to be reached earlier.

• For further optimization, we consider the ET criterion incorporated with the FS-Net-based initial solution. In particular, unlike the conventional ET criterion, we propose using an adaptive cutoff factor, which is adjusted based on the accuracy of the FS-Net-based initial solution. As a result, when the initial solution is likely to be accurate, a small number of searching iterations is taken, which leads to a reduction in the overall complexity of the TS algorithm.

The rest of the paper is organized as follows: Section II presents the system model. Section III reviews and analyzes the complexity of the prior DNNs architectures for symbol detection, namely, the FC-DNN, DetNet, and ScNet, followed by the proposal of the FS-Net architecture. Section IV presents the DL-aided TS detection algorithm. In Section V, the simulation results are shown. Finally, the conclusions are presented in Section VI.

Notations: Throughout this paper, scalars, vectors, and matrices are denoted by lower-case, bold-face lower-case, and bold-face upper-case letters, respectively. The th element of a matrix A is denoted by , whereas and denote the transpose and conjugate transpose of a vector, respectively. Furthermore, and represent the absolute value of a scalar and the norm of a vector or matrix, respectively. The expectation operator is denoted by , whereas means distributed as.

## Ii System Model

We consider the uplink of a multi-user MIMO system with receive antennas, where the total number of transmit antennas among all users is . The received signal vector is given by

 ~{y}=~{H}~{s}+~{n}, (1)

where is the vector of transmitted symbols. We assume that , where is the average symbol power, and is a vector of independent and identically distributed (i.i.d.) additive white Gaussian noise (AWGN) samples, . Furthermore, denotes an channel matrix consisting of entries , where represents the complex channel gain between the th transmit antenna and the th receive antenna. The transmitted symbols are independently drawn from a complex constellation of points. The set of all possible transmitted vectors forms an -dimensional complex constellation consisting of vectors, i.e., .

The complex signal model (1) can be converted to an equivalent real signal model

 {y}={H}{s}+{n}, (2)

where and H given by

 ⎡⎢⎣R(~{s})I(~{s})⎤⎥⎦,⎡⎢⎣R(~{y})I(~{y})⎤⎥⎦,⎡⎢⎣R(~{n})I(~{n})⎤⎥⎦, and ⎡⎢⎣R(~{H})−I(~{H})I(~{H})R(~{H})⎤⎥⎦,

respectively denote the -equivalent real transmitted signal vector, -equivalent real received signal, AWGN noise signal vectors, and -equivalent real channel matrix, with . Here, and denote the real and imaginary parts of a complex vector or matrix, respectively. Then, the set of all possible real-valued transmitted vectors forms an -dimensional constellation consisting of vectors, i.e., . In this work, we use the equivalent real-valued signal model in (2) because it can be employed for both the TS algorithm and DNNs.

#### Ii-1 Conventional optimal solution

The ML solution can be written as

 ^{s}ML=argmin{s}∈ANϕ({s}), (3)

where is the ML metric of s. The computational complexity of ML detection in (3) is exponential with [srinidhi2011layered], which results in extremely high complexity for large MIMO systems, where is very large.

#### Ii-2 DNN-based solution

A DNN can be modeled and trained to approximate the transmitted signal vector s. The solution obtained by a DNN with layers can be formulated as

 ^{s}=Q(^{s}[L]),

where is the element-wise quantization operator that quantizes to . Here, is the output vector at the th layer, which can be expressed as

 ^{s}[L]=f[L](f[L−1](…(f[1]({x}[1];{P}[1]);…);{P}[L−1]);{P}L), (4)

where

 f[l]({x}[l];{P}[l])=σ[l]({W}[l]{x}[l]+{b}[l]) (5)

represents the nonlinear transformation in the th layer with the input vector , the activation function , and consisting of the weighting matrix and bias vector . We see that (4) indicates the serial nonlinear transformations in the DNN that maps the input , including the information contained in y and H, to the output .

In large MIMO systems, many hidden layers and neurons are required for the DNN to extract meaningful features and patterns from the large amount of input data to provide high accuracy. Furthermore, the high-dimension signals and large channel matrix lead to the large input vector x, which requires large and for the transformation in (5). As a result, the computational complexity of the detection network typically becomes very high in large MIMO systems.

## Iii DNNs for MIMO Detection

In this section, we first analyze the architecture designs and complexities of three existing DNNs for MIMO detection in the literature, namely, FC-DNN [samuel2019learning], DetNet [samuel2019learning], and ScNet [gao2018sparsely]. This motivates us to further optimize them and propose a novel DNN for performance improvement and complexity reduction in MIMO detection.

### Iii-a FC-DNN, DetNet, and ScNet architectures

#### Iii-A1 FC-DNN architecture

The application of the well-known FC-DNN architecture for MIMO detection is investigated in [samuel2019learning, samuel2017deep]. In this FC-DNN architecture, the input vector contains all the received signal and channel entries, i.e., . The performance of the FC-DNN is examined in two scenarios: fixed and varying channels. It is shown in [samuel2019learning] that the FC-DNN architecture performs well for fixed channels; however, this is an impractical assumption. In contrast, for varying channels, its performance is very poor. Therefore, it cannot be employed for symbol detection in practical MIMO systems, and a more sophisticated DNN architecture is required for this purpose.

#### Iii-A2 DetNet

In the DetNet, the transmitted signal vector is updated over iterations corresponding to layers of the neural network based on mimicking a projected gradient descent-like ML optimization, which leads to iterations of the form [samuel2019learning, samuel2017deep]

 ^{s}[l+1] =Π⎡⎣{s}−δ[l]∂∥∥{y}−{H}{s}∥∥2∂{s}⎤⎦{s}=^{s}[l] =Π[^{s}[l]−δ[l]{H}T{y}+δ[l]{H}T{H}^{s}[l]], (6)

where denotes a nonlinear projection operator and is a step size.

The operation and architecture of the th layer of the DetNet is illustrated in Fig. 1. It is shown that and , which are not only the output of the th layer but also the input of the th layer, are updated as follows:

 {q}[l] =^{s}[l−1]−δ[l]1{H}T{y}+δ[l]2{H}T{H}^{s}[l−1], (7) {x}[l] =[{v}[l−1],{q}[l]]T, (8) {z}[l] =σ({W}[l]1{x}[l]+{b}% [l]1), (9) ^{s}[l] =ψt({W}[l]2{z}[l]+{% b}[l]2), (10) {v}[l] ={W}[l]3{z}[l]+{b}[l]3, (11)

where and are the input of the first layer of the network, which are initialized as , with 0 being an all-zero vector of an appropriate size. In (8), and are concatenated into a single input vector . In (9), is the rectified linear unit (ReLU) activation function. Furthermore, in (10), defined as

 ψt(x)=−q+1|t|∑i∈Ω[σ(x+i+t)−σ(x+i−t)]

with for QPSK and for 16-QAM, guarantees that the amplitudes of the elements of are in the range for QPSK and for 16-QAM, as illustrated in Fig. 2. The final detected symbol vector is given as , where quantizes each element of to its closest real-constellation symbol in .

In the training phase, the weights and biases of the DetNet are optimized by minimizing the loss function

 L({s},^{s})=L∑l=1log(l)∥∥∥{s}−^{s}[l]∥∥∥2, (12)

which measures the total weighted distance between the transmitted vector s and the outputs of all the layers, i.e., . The DetNet is trained to optimize the parameter set , , , such that is minimized. As a result, can converge to s.

We now consider the computational complexity of the DetNet, which is defined as the total number of additions and multiplications required in (7)–(11). The computations of and require and operations, respectively, and are performed once in the first layer. The complexities required in (7), (9)–(11) depend on the modulation scheme, as follows:

• For QPSK, the sizes of , , is set to [samuel2019learning], leading to . As a result, the sizes of , and can be inferred from the size of , which is set to [samuel2019learning], as follows: , and . Furthermore, because , we have , and . Then, given and , the total complexity required in each layer is operations, including , , , and operations for (7), (9)–(11), respectively. Therefore, the total complexity of all layers of the DetNet is given as

 CQPSKDetNet =N(2M−1)+N2(2M−1)+L(18N2+N) =(18L+2M−1)N2+(2M−1+L)N. (13)
• For 16-QAM, and are set to and [samuel2019learning]. Therefore, we have , resulting in , , , and . Then, given and , the complexities required in (7) and (9)–(11) are , , , and operations, respectively. As a result, the total complexity required in each layer of the DetNet with 16-QAM is operations. Therefore, the total complexity of all layers of the DetNet is given as

 C16-QAMDetNet =N(2M−1)+N2(2M−1)+L(50N2+N) =(50L+2M−1)N2+(2M−1+L)N. (14)

It is observed from (13) and (14) that for both QPSK and 16-QAM, the complexity of the DetNet can be substantially high in large MIMO systems, where is large. This high complexity is due to the use of the additional input vector v and the full connections between the input and output vectors in every layer. Furthermore, the complicated architecture of the DetNet makes it difficult to optimize, which results in its relatively low performance, as will be shown in Section V. Therefore, the ScNet was introduced in [gao2018sparsely] for complexity reduction and performance improvement.

#### Iii-A3 ScNet

The ScNet also follows the update process in (6), but it simplifies the DetNet architecture based on the following observations:

• While q contains information of H, y, and , the additional input vector v does not contain any other meaningful information. Therefore, v is removed in the ScNet architecture. As a result, is also removed. It is shown in [gao2018sparsely] that this simplification leads not only to reduced complexity and training time, but also improved performance.

• Furthermore, it is observed in (6) that the first element of only depends on the first element of , which implies that the full connection between all elements of and is unnecessary. As a result, the input and output of each layer of the ScNet are directly connected in the element-wise manner. Consequently, is removed, and the weight matrix is reduced to a weight vector of size .

The operation and architecture of the ScNet is illustrated in Fig. 3. Similar to DetNet, ScNet is initialized with . Then, the output of the th layer, i.e., , is updated as follows:

 {x}[l] =[{H}T{y},{H}T{H}^{s}[l−1],^{s}[l−1]]T, (15) ^{s}[l] =ψt({w}[l]⊙{x}[l]+{b}l), (16)

where denotes the element-wise multiplication of and . Given , the computation of in (15) requires operations. Furthermore, because , we have , and the computation in (16) requires only operations. Consequently, the complexity of each layer of the ScNet architecture is . Taking the complexities of computing and into consideration, the ScNet architecture requires

 CScNet =N(2M−1)+N2(2M−1)+L(2N2+5N) =(2M−1+2L)N2+(2M−1+5L)N (17)

operations in total. Compared to the DetNet expressed in (13) and (14), it is observed that the ScNet requires much lower complexity. Furthermore, the simulation results in [gao2018sparsely] show that for BER , the ScNet achieves an approximate SNR gain of 1 dB over the DetNet.

However, one drawback of the ScNet is that the input vector x has the size of , which is three times larger than that of q in the DetNet for containing the information for y, H, and . This may result in unnecessary computational complexity of the ScNet. Furthermore, both the DetNet and ScNet employ the loss function (12) for the optimization of the weights and biases. This loss function is able to minimize the distance between s and , which allows to converge to s after a certain number of updates, which is equal to the number of layers in the DetNet and ScNet. However, it does not guarantee fast convergence. Meanwhile, if converges to s faster, a smaller number of layers can be required to achieve the same accuracy. In other words, for the same number of layers, if a better loss function is employed, then the performance can be improved. These observations on the input vector and the loss function of the DetNet and ScNet motivate us to propose the FS-Net for complexity reduction and performance improvement.

### Iii-B Proposed FS-Net architecture

#### Iii-B1 Network architecture

Inheriting the DetNet and ScNet, the proposed FS-Net is also motivated by the updating process in (6). We note that (6) can be rewritten as

 ^{s}[l+1] =Π[^{s}[l]+δ[l]({H}T{H}^{s}[l]−{H}T{y})], (18)

which shows that the contributions of and to are different. Therefore, their elements should be processed by different weights and biases. Furthermore, (18) also implies that the elements at the same position of and can be multiplied by the same weight. Therefore, in the proposed FS-Net, we set the input vector of the th layer to

 {x}[l]=[^{s}[l],{H}T{H}^{s}[l]−{H}T{y}]T∈R2N×1,

whose size is only that of in (15) for the ScNet. Furthermore, the FS-Net follows the sparse connection of ScNet. Consequently, in each layer of the FS-Net, there are only element-wise connections between the input and output, whereas the ScNet has .

The operation and architecture of the proposed FS-Net network is illustrated in Fig. 4. Furthermore, Algorithm 1 summarizes the FS-Net scheme for MIMO detection. The output of each layer is updated in step 4, where is obtained in step 3 with the requirement of operations to compute when and are given. In the FS-Net architecture, we have . Therefore, the computation in each layer requires only operations. We recall that the complexities of computing and are and operations. As a result, the complexity of the entire FS-Net is given as

 CFS-Net =N(2M−1)+N2(2M−1)+L(2N2+4N) =(2M−1+2L)N2+(2M−1+4L)N, (19)

which is less than that of the ScNet architecture by operations and considerably lower than the complexity of the DetNet given in (13) and (14).

#### Iii-B2 Loss function

We consider the correlation between and s in the loss function for better training the FS-Net. Specifically, the loss function of the FS-Net is redefined as

 L({s},^{s})=L∑l=1log(l)[α∥∥∥{s}−^{s}[l]∥∥∥2+βr(^{s}[l],{s})], (20)

where . Based on the Cauchy–Schwarz inequality, we have , where equality occurs if with a constant . Therefore, we have , and if .

The DetNet, ScNet, and FS-Net schemes initialize as 0 and update over layers to approximate s. This can be considered as sequential moves starting from the coordinate 0 over to reach . By minimizing the loss function in (20), we have , or equivalently, , which enables the moves to be in a specific direction, which is the position of s in a hypersphere. This can shorten the path of the moves to reach s, which results in the reduced number of required layers in the FS-Net. In (20), and are used to adjust the contributions of and to the loss function, which are optimized by simulations. Our simulation results in Section V show that the proposed FS-Net achieves not only complexity reduction, but also performance improvement with respect to the conventional DetNet and ScNet schemes.

## Iv Deep Learning-Aided Tabu Search Detection

### Iv-a Problem formulation

The TS algorithm starts with an initial candidate and sequentially moves over candidates for iterations. In each iteration, all the non-tabu neighbors of the current candidate c are examined to find the best neighbor with the smallest ML metric, i.e.,

 {x}⋆=argmin{x}∈N(%c)ϕ({x}), (21)

where consists of non-tabu neighboring vectors inside alphabet with the smallest distance to c, i.e.,

 N({c})={{x}∈AN∖L,∣∣{x}−{c}∣∣=θmin}, (22)

where denotes the alphabet excluding the tabu vectors kept in the tabu list , and is the minimum distance between two constellation points in a plane. Furthermore, the ML metric can be expressed as [nguyen2019qr]

 ϕ({x})=∥∥{u}+{h}dδd∥∥2, (23)

where , is the single nonzero element of and is the th column of H. In this study, we refer to as the difference position of a neighbor, in which the candidate and its neighbor are different. For example, if the current candidate is , then is the difference position for because c and x are only different with respect for the third element.

After the best neighbor is determined with (21), it becomes the candidate in the next iteration, and the determination of the best neighbor of a new candidate is performed. By this iterative manner, the final solution is determined as the best candidate visited so far, i.e.,

 ^{s}TS=argmin{c}∈V{ϕ({c})}

where is the set of all visited candidates over searching iterations.

The computational complexity of TS algorithms is proportional to the number of searching iterations, i.e., . In large MIMO systems, the number of neighbors in each iteration and the dimension of the neighboring vectors are large. Therefore, the complexity to find the best neighbor in each iteration becomes high in large MIMO systems. Furthermore, an extremely large is required to guarantee that the near-optimal solution is found. Consequently, the complexity of the TS algorithm can be excessively high in large MIMO systems. To reduce the complexity of TS in large MIMO systems, an ET scheme can be employed. Specifically, a cutoff factor , , is used to terminate the iterative searching process early after iterations in which no better solution is found. As a result, the number of searching iterations in the TS algorithm with ET is

 I=min{I0+Ie,IUB}, (24)

where is the number of iterations after which no better solution is found. The TS algorithm without ET requires searching iterations, which is also the upper bound on the number of iterations required when ET is applied. From (24), the following remarks are noted.

###### Remark 1

If ET occurs, the best solution found after iterations and that found after iterations are the same, which is the final solution of the TS algorithm, i.e., . Therefore, the earlier is found, the smaller is required. This objective can be achieved by starting the moves in the TS algorithm with a good initial solution. Furthermore, if the initial solution is well taken such that it is likely to be a near-optimal solution, then no further searching iteration is required. In this case, the complexity of the TS algorithm becomes only that involved in finding the initial solution.

###### Remark 2

Another approach to find the optimal solution earlier and reduce is to make efficient moves during the searching iterations in the TS algorithm. In other words, the moves can be guided so that is reached earlier.

###### Remark 3

With the conventional ET criterion, is fixed to . In large MIMO systems, a large is required to guarantee that the optimal solution is found, which results in a large . However, we note that once ET occurs, the further search over iterations do not result in any performance improvement while causing significant computational burden for the TS algorithm. For example, in a MIMO system with QPSK, should be used to approximately achieve the performance of the SD scheme [nguyen2019qr]. For , iterations are required before the termination, whereas was already found in the th iteration. Therefore, a more efficient ET criterion is required to reduce the complexity of the TS algorithm with ET.

Remarks 1–3 motivate us to propose a TS-based detection algorithm for complexity reduction with three design objectives: taking a good initial solution, using efficient moves in searching iterations so that is reached as soon as possible, and terminating the TS algorithm early based on an efficient ET criterion. In the next subsection, we propose the DL-aided TS algorithm with the application of DL to the TS detection for those objectives.

### Iv-B Proposed DL-aided TS algorithm

The main ideas of the proposed DL-aided TS algorithm can be explained as follows.

#### Iv-B1 DL-aided initial solution

Unlike the conventional TS algorithms, in which the ZF, MMSE, or OSIC solution is taken as the initial solution [zhao2007tabu], the DL-aided TS algorithm employs a DNN to generate the initial solution.

In this scheme, the most important task is to choose a DNN architecture that is not only able to approximate the transmitted signal vector with high accuracy, but also has low computational complexity. As discussed in Section II-B, a basic FC-DNN cannot achieve high BER performance for varying channels. By contrast, among the DetNet, ScNet, and FS-Net, the FS-Net requires the lowest complexity while achieving the best performance, which is demonstrated in Section V. Therefore, we propose employing the FS-Net to find the initial solution in the DL-aided TS algorithm. Specifically, obtained in step 6 of Algorithm 1 is taken as the initial solution of the DL-aided TS algorithm. As a result, the required complexity for this initialization phase is given in (19).

#### Iv-B2 Efficient moves in searching iterations

In the TS algorithm with ET, the complexity can be reduced if efficient moves are made during the iterative searching process, so that is found earlier, as discussed in Remark 2. For this purpose, we propose exploiting the difference between and , which are the output of the last layer and the final solution of the FS-Net, respectively. We recall that can contain elements both inside and outside the alphabet , as observed from step 4 in Algorithm 1 and Fig. 2. By contrast, we have .

Let e denote the distance between the elements of and , i.e.,

 {e}=∣∣∣^{s}[L]−^{s}∣∣∣=[e1,e2,…,eN]T.

For QAM signals, the distance between two neighboring real symbols is two. Furthermore, from Fig. 2, we have for QPSK and for 16-QAM. Therefore, we have . It is observed that if , there is a high probability that the th symbol in is correctly approximated by the FS-Net, i.e., . By contrast, if , there is a high probability that is an erroneous estimate, i.e., . Therefore, by examining the elements of e, we can determine the elements of with high probabilities of errors.

Example 1: Consider a MIMO system with , QPSK, and

 ^{s}[L] =[0.1,−0.9,−0.2,1,0.25,0.9,−1,1]T, ^{s} =[1,−1,−1,1,1,1,−1,1]T.

Then, we have

 {e}=∣∣∣^{s}[L]−^{s}∣∣∣=[0.9,0.1,0.8,0,0.75,0.1,0,0]T,

which implies that and can be incorrect with high probabilities.

In the following analysis, we refer to the symbols of with high probabilities of being incorrect as the predicted incorrect symbols. Furthermore, the th element in is determined as a predicted incorrect symbol if is beyond a predefined error-threshold , i.e., if . The error-threshold should be inversely proportional to the SNR, which leads to , where is chosen such that . Furthermore, because , we set to

 γ=min{λSNR,0.5}. (25)

Let denote the number of predicted incorrect symbols in ( in Example 1), and let be the set of all candidates obtained by correcting . Then we have

 S=S(1)∪S(2)∪…∪S(ne),

where is a subset of obtained by correcting elements of . The number of candidates in is given as

 ∣∣S(k)∣∣ =Cnek×(Q−1)k=ne!(Q−1)kk!(ne−k)!,

where is the number of combinations for choosing out of predicted incorrect symbols, and is the number of real symbols in with for QPSK, 16-QAM, and 64-QAM, respectively. Consequently, the total number of possible corrected vectors in is

 |S|=ne∑k=1ne!(Q−1)kk!(ne−k)!.

Now, denoting as the best vector obtained by correcting , we have

 (26)

In the case of large and high-order QAM schemes, becomes large; hence, high complexity is required to find in (26).

However, we note that correcting a symbol in is equivalent to a move from to one of its neighboring vectors in an iteration of the TS algorithm. Specifically, if is likely to be wrong, then a move from to its neighbor x for the difference position of should be made. In this case, and are neighboring symbols, and for . In this manner, many of the predicted incorrect symbols can be corrected after searching iterations of the TS algorithm with high probabilities. Let be the solution found by correcting the predicted incorrect symbols of after iterations. We have

 ¯{s}⋆TS=argmin¯{s}∈¯Sϕ(¯{s}), (27)

where is given in (23), and , with being the current candidate in the th iteration, .

When is sufficiently small, the complexity involved in finding with (27) becomes much smaller than that for finding with (26), as well as that for the conventional TS algorithm with (21). This is because is a subset of , and hence, unlike the conventional TS algorithm, in the th iteration, , a reduced number of neighboring vectors are examined, which implies that the complexity required during iterations to find can be low. If the incorrect symbols are predicted with high accuracy, there is a high probability that is close to . Therefore, only a small number of further iterations are required to reach , and the total complexity of the TS algorithm can be reduced.

#### Iv-B3 Proposed ET criteria

In the conventional TS algorithm with ET, the algorithm is terminated early after iterations as given in (24), where is set to with a fixed . Remark 3 shows that this stopping criterion is inefficient. Specifically, in many cases, the further searches over iterations does not result in performance improvement while creating unnecessary complexity for the TS algorithm.

In this work, to minimize the number of redundant searching iterations, we propose an adaptive ET criterion based on . We have the following notes:

• If , there is no predicted incorrect symbol in , which implies a high probability that is already the optimal solution. In this case, no further searching iterations are needed, i.e., .

• If is small, there is a high probability that only a few elements of are incorrect, which can be corrected after a small number of moves in the TS algorithm. Therefore, in this case, only a small is required.

• In the case that , a sufficiently large number of searching iterations should be performed to guarantee the optimal performance.

Therefore, we propose using an adaptive cutoff factor depending on as follows:

 ^ε=min{ε,μneN}, (28)

where is optimized through simulations to guarantee that a sufficient number of iterations is used in the DL-aided TS algorithm. Consequently, in the DL-aided TS algorithm, only iterations are required before the searching process is terminated, with because . From (28), we have as , and as , implying the use of a smaller number of iterations for a smaller , and vice versa.

The DNN-aided TS algorithm is presented in Algorithm 2. In step 1, and are obtained by the FS-Net scheme in Algorithm 1, allowing e to be computed in step 2 as the difference between and . In step 4, the list including the positions of the predicted incorrect elements of is found based on , which is set in step 3. Then, is set to the size of in step 5. The DL-aided TS algorithm is initialized in steps 6–11. Specifically, step 6 assigns to the current candidate c and pushes it to the tabu list. Then, the best solution and its metric are initialized as c and , respectively, in step 7. The adaptive cutoff factor and are computed in steps 8 and 9, respectively.

In steps 12–30, is searched over iterations. The first iterations are used to correct the predicted incorrect symbols in . In particular, in the th iteration, , the best candidate is found in step 15. In contrast, in the remaining iterations, the conventional searching manner is used, where all the neighbors in are examined to find , as in steps 17 and 18. Comparing 15 to 18, it is observed that the proposed searching approach based on the predicted incorrect symbols requires lower complexity than the conventional searching approach because . In steps 21–26, if a better solution is found, is updated, and at the same time, is set to zero to allow further moves to find a better solution. Otherwise, increases by one until it reaches . The following steps update the best solution and the tabu list, and step 31 concludes the final solution after the searching phase is finished.

## V Simulation Results

In this section, we numerically evaluate the BER performance and computational complexities of the proposed FS-Net detection architecture and the DL-aided TS algorithm. In our simulations, each channel coefficient is assumed to be an i.i.d. zero-mean complex Gaussian random variable with a variance of per dimension. The SNR is defined as the ratio of the average symbol power to the noise power .

### V-a Training DNNs

We follow the training model in [samuel2019learning] and [gao2018sparsely]. Specifically, we use the Adam optimizer [kingma2014adam], which is a variant of the stochastic gradient descent method [rumelhart1988learning, bottou2010large] for optimizing the DNNs. The DNNs are implemented by using Python with the Tensorflow library [abadi2016tensorflow] and a standard Intel i7-6700 processor. For the training phase, a decaying learn