Deep LearningAided Tabu Search Detection for Large MIMO Systems
Abstract
In this study, we consider the application of deep learning (DL) to tabu search (TS) detection in large multipleinput multipleoutput (MIMO) systems. First, we propose a deep neural network architecture for symbol detection, termed the fastconvergence sparsely connected detection network (FSNet), which is obtained by optimizing the prior detection networks called DetNet and ScNet. Then, we propose the DLaided TS algorithm, in which the initial solution is approximated by the proposed FSNet. Furthermore, in this algorithm, an adaptive early termination algorithm and a modified searching process are performed based on the predicted approximation error, which is determined from the FSNetbased initial solution, so that the optimal solution can be reached earlier. The simulation results show that the proposed algorithm achieves approximately complexity reduction for a MIMO system with QPSK with respect to the existing TS algorithms, while maintaining almost the same performance.
I Introduction
In mobile communications, a large multipleinput multipleoutput (MIMO) system is a potential technique to dramatically improve the system’s spectral and power efficiency [ngo2013energy], [marzetta2010noncooperative]. However, in order for the promised benefits of large MIMO systems to be reaped, significantly increased computational complexity requirements are presented at the receiver when compared to those of the conventional MIMO system [wu2014large, nguyen2019qr]. Therefore, lowcomplexity nearoptimal detection is an important challenge in realizing large MIMO systems [rusek2013scaling, chockalingam2014large, mandloi2017low]. Two major lines of studies have been conducted recently to fulfill that challenge, including proposals of lowcomplexity nearoptimal detection algorithms [vardhan2008low, mohammed2009high, qin2016near, mandloi2017error, som2010improved, mohammed2009low, datta2013novel, hansen2009near, mandloi2017layered, narasimhan2014channel, vsvavc2013soft, nguyen2019qr] and resorting to deeplearning (DL) techniques for symbol detection in massive MIMO systems [farsad2017detection, ye2017power, mohammadkarimi2018deep, samuel2019learning, samuel2017deep, gao2018sparsely].
Ia Recent works
Various algorithms for largeMIMO detection have been introduced [vardhan2008low, mohammed2009high, qin2016near, mandloi2017error, som2010improved, mohammed2009low, datta2013novel, hansen2009near, mandloi2017layered, narasimhan2014channel, vsvavc2013soft, nguyen2019qr]. Among them, the tabu search (TS) detector is considered as a complexityefficient scheme for symbol detection in large MIMO systems. It has been shown that the TS detection algorithm can perform very close to the maximumlikelihood (ML) bound with far lower complexity compared to sphere decoding (SD) and fixedcomplexity SD (FSD) schemes in large MIMO systems [rusek2013scaling], [srinidhi2011layered]. In [srinidhi2009low], an approach based on reactive TS (RTS) is proposed for nearML decoding of nonorthogonal spacetime block codes (STBCs) with 4QAM. However, its performance is far from optimal for higherorder QAMs, such as 16 and 64QAM [srinidhi2009near]. The work in [srinidhi2011layered] proposes an algorithm called layered TS (LTS). This algorithm improves the performance of the TS detection in terms of the biterror rate (BER) for higherorder QAM in large MIMO systems. However, to achieve a BER of in and MIMO systems with 16QAM, higher complexities are required than in conventional TS. The randomrestart reactive TS (R3TS) algorithm, which runs multiple RTS and chooses the best among the resulting solution vectors, is presented in [datta2010random]. It achieves improved BER performance at the expense of increased complexity. The complexity of R3TS is generally higher than that of RTS to achieve a BER of , especially for large antenna configurations and highorder QAMs, such as MIMO with 64QAM. The work of [zhao2007tabu] has been conducted to further improve TS in terms of complexity, which is based on a reduced number of examined neighbors and an earlytermination (ET) criterion. However, it comes at the cost of performance loss; for example, to achieve BER for MIMO with 16QAM modulation, the TS algorithm with ET has a 3dB signaltonoise ratio (SNR) loss compared to the original TS [zhao2007tabu]. In [nguyen2019qr], the QRdecompositionaided TS (QRTS) algorithm is proposed for achieving considerable complexity reduction without any performance loss.
On the other hand, the application of deep learning (DL) to symbol detection in MIMO systems has recently gained much attention [farsad2017detection, ye2017power, samuel2019learning, samuel2017deep, gao2018sparsely, mohammadkarimi2018deep]. In [farsad2017detection], three detection algorithms based on deep neural networks (DNNs) are proposed for molecular communication systems, which are shown to perform much better than the prior simple detectors. In contrast, the application of DL to symbol detection in orthogonal frequencydivision multiplexing (OFDM) systems are considered in [ye2017power]. Specifically, Ye et al. in [ye2017power] show that the detection scheme based on DL can address channel distortion and detect the transmitted symbols with performance comparable to that of the minimum meansquare error (MMSE) receiver. In [mohammadkarimi2018deep], the DLbased SD scheme is proposed. In particular, the DLbased SD with the radius of the decoding hypersphere learned by a DNN achieves significant complexity reduction with respect to the conventional SD with a marginal performance loss. In particular, the works of [samuel2019learning] and [samuel2017deep] focus on the design of DNNs for symbol detection in large MIMO systems. Specifically, Samuel et al. in [samuel2019learning] and [samuel2017deep] first investigate the fully connected DNN (FCDNN) architecture for symbol detection and show that although it performs well for fixed channels, its BER performance is very poor for varying channels. To overcome this problem, a DNN that works for both fixed and varying channels, called the detection network (DetNet), is introduced [samuel2019learning, samuel2017deep]. However, the DetNet requires high computational complexity because of its complicated network architecture, motivating the proposal of the sparsely connected network (ScNet) in [gao2018sparsely] to improve performance and reduce complexity.
IB Contributions
Although TS detection is considered an efficient symboldetection algorithm for large MIMO systems [nguyen2019qr, srinidhi2011layered], it requires many searching iterations to find the optimal solution, causing high computational complexity. The TS algorithm introduced in [zhao2007tabu] uses an ET criterion to terminate the iterative searching process early after a certain number of iterations when no better solution is found. Although this scheme provides complexity reduction, it can result in significant performance loss because the early terminated searching process does not guarantee the optimal solution. However, the number of searching iterations in the TS algorithm can be reduced with only marginal performance loss if a good initial solution and efficient searching/ET strategies are employed, which can be facilitated by DL. More specifically, we found that the initial solution obtained by a DNN is remarkably more reliable than the conventional linear zeroforcing (ZF)/MMSE and ordered successive interferencecancellation (OSIC) solutions. Furthermore, unlike in the cases of the ZF, MMSE, and OSIC receivers, the initial solution generated by an appropriate activation function in the DNN often has signals very close to or exactly the same as the constellation symbols, even before a quantization is applied. This property can be exploited to efficiently determine the reliable/unreliable detected symbols in the initial solution. Based on these aspects, the DLaided TS algorithm is proposed for complexity reduction of the TS algorithm with ET. Our main contributions are summarized as follows:

First, we further optimize the DetNet [samuel2019learning, samuel2017deep] and ScNet [gao2018sparsely] architectures to develop the fastconvergence sparsely connected detection network (FSNet). Our simulation results show that the proposed FSNet architecture achieves improved performance and reduced complexity with respect to DetNet and ScNet. As a result, the FSNetbased solution is taken as the initial solution of the TS algorithm.

In each iteration of the conventional TS algorithm, the move from the current candidate to its best neighbor is made, even when it does not result in a better solution. Therefore, it is possible that no better solution is found after a large number of iterations, but high complexity is required. This motivates us to improve the iterative searching phase of the TS algorithm. Specifically, by predicting the incorrect symbols in the FSNetbased initial solution, more efficient moves can be made so that the optimal solution is more likely to be reached earlier.

For further optimization, we consider the ET criterion incorporated with the FSNetbased initial solution. In particular, unlike the conventional ET criterion, we propose using an adaptive cutoff factor, which is adjusted based on the accuracy of the FSNetbased initial solution. As a result, when the initial solution is likely to be accurate, a small number of searching iterations is taken, which leads to a reduction in the overall complexity of the TS algorithm.
The rest of the paper is organized as follows: Section II presents the system model. Section III reviews and analyzes the complexity of the prior DNNs architectures for symbol detection, namely, the FCDNN, DetNet, and ScNet, followed by the proposal of the FSNet architecture. Section IV presents the DLaided TS detection algorithm. In Section V, the simulation results are shown. Finally, the conclusions are presented in Section VI.
Notations: Throughout this paper, scalars, vectors, and matrices are denoted by lowercase, boldface lowercase, and boldface uppercase letters, respectively. The th element of a matrix A is denoted by , whereas and denote the transpose and conjugate transpose of a vector, respectively. Furthermore, and represent the absolute value of a scalar and the norm of a vector or matrix, respectively. The expectation operator is denoted by , whereas means distributed as.
Ii System Model
We consider the uplink of a multiuser MIMO system with receive antennas, where the total number of transmit antennas among all users is . The received signal vector is given by
(1) 
where is the vector of transmitted symbols. We assume that , where is the average symbol power, and is a vector of independent and identically distributed (i.i.d.) additive white Gaussian noise (AWGN) samples, . Furthermore, denotes an channel matrix consisting of entries , where represents the complex channel gain between the th transmit antenna and the th receive antenna. The transmitted symbols are independently drawn from a complex constellation of points. The set of all possible transmitted vectors forms an dimensional complex constellation consisting of vectors, i.e., .
The complex signal model (1) can be converted to an equivalent real signal model
(2) 
where and H given by
respectively denote the equivalent real transmitted signal vector, equivalent real received signal, AWGN noise signal vectors, and equivalent real channel matrix, with . Here, and denote the real and imaginary parts of a complex vector or matrix, respectively. Then, the set of all possible realvalued transmitted vectors forms an dimensional constellation consisting of vectors, i.e., . In this work, we use the equivalent realvalued signal model in (2) because it can be employed for both the TS algorithm and DNNs.
Ii1 Conventional optimal solution
The ML solution can be written as
(3) 
where is the ML metric of s. The computational complexity of ML detection in (3) is exponential with [srinidhi2011layered], which results in extremely high complexity for large MIMO systems, where is very large.
Ii2 DNNbased solution
A DNN can be modeled and trained to approximate the transmitted signal vector s. The solution obtained by a DNN with layers can be formulated as
where is the elementwise quantization operator that quantizes to . Here, is the output vector at the th layer, which can be expressed as
(4) 
where
(5) 
represents the nonlinear transformation in the th layer with the input vector , the activation function , and consisting of the weighting matrix and bias vector . We see that (4) indicates the serial nonlinear transformations in the DNN that maps the input , including the information contained in y and H, to the output .
In large MIMO systems, many hidden layers and neurons are required for the DNN to extract meaningful features and patterns from the large amount of input data to provide high accuracy. Furthermore, the highdimension signals and large channel matrix lead to the large input vector x, which requires large and for the transformation in (5). As a result, the computational complexity of the detection network typically becomes very high in large MIMO systems.
Iii DNNs for MIMO Detection
In this section, we first analyze the architecture designs and complexities of three existing DNNs for MIMO detection in the literature, namely, FCDNN [samuel2019learning], DetNet [samuel2019learning], and ScNet [gao2018sparsely]. This motivates us to further optimize them and propose a novel DNN for performance improvement and complexity reduction in MIMO detection.
Iiia FCDNN, DetNet, and ScNet architectures
IiiA1 FCDNN architecture
The application of the wellknown FCDNN architecture for MIMO detection is investigated in [samuel2019learning, samuel2017deep]. In this FCDNN architecture, the input vector contains all the received signal and channel entries, i.e., . The performance of the FCDNN is examined in two scenarios: fixed and varying channels. It is shown in [samuel2019learning] that the FCDNN architecture performs well for fixed channels; however, this is an impractical assumption. In contrast, for varying channels, its performance is very poor. Therefore, it cannot be employed for symbol detection in practical MIMO systems, and a more sophisticated DNN architecture is required for this purpose.
IiiA2 DetNet
In the DetNet, the transmitted signal vector is updated over iterations corresponding to layers of the neural network based on mimicking a projected gradient descentlike ML optimization, which leads to iterations of the form [samuel2019learning, samuel2017deep]
(6) 
where denotes a nonlinear projection operator and is a step size.
The operation and architecture of the th layer of the DetNet is illustrated in Fig. 1. It is shown that and , which are not only the output of the th layer but also the input of the th layer, are updated as follows:
(7)  
(8)  
(9)  
(10)  
(11) 
where and are the input of the first layer of the network, which are initialized as , with 0 being an allzero vector of an appropriate size. In (8), and are concatenated into a single input vector . In (9), is the rectified linear unit (ReLU) activation function. Furthermore, in (10), defined as
with for QPSK and for 16QAM, guarantees that the amplitudes of the elements of are in the range for QPSK and for 16QAM, as illustrated in Fig. 2. The final detected symbol vector is given as , where quantizes each element of to its closest realconstellation symbol in .
In the training phase, the weights and biases of the DetNet are optimized by minimizing the loss function
(12) 
which measures the total weighted distance between the transmitted vector s and the outputs of all the layers, i.e., . The DetNet is trained to optimize the parameter set , , , such that is minimized. As a result, can converge to s.
We now consider the computational complexity of the DetNet, which is defined as the total number of additions and multiplications required in (7)–(11). The computations of and require and operations, respectively, and are performed once in the first layer. The complexities required in (7), (9)–(11) depend on the modulation scheme, as follows:

For QPSK, the sizes of , , is set to [samuel2019learning], leading to . As a result, the sizes of , and can be inferred from the size of , which is set to [samuel2019learning], as follows: , and . Furthermore, because , we have , and . Then, given and , the total complexity required in each layer is operations, including , , , and operations for (7), (9)–(11), respectively. Therefore, the total complexity of all layers of the DetNet is given as
(13) 
For 16QAM, and are set to and [samuel2019learning]. Therefore, we have , resulting in , , , and . Then, given and , the complexities required in (7) and (9)–(11) are , , , and operations, respectively. As a result, the total complexity required in each layer of the DetNet with 16QAM is operations. Therefore, the total complexity of all layers of the DetNet is given as
(14)
It is observed from (13) and (14) that for both QPSK and 16QAM, the complexity of the DetNet can be substantially high in large MIMO systems, where is large. This high complexity is due to the use of the additional input vector v and the full connections between the input and output vectors in every layer. Furthermore, the complicated architecture of the DetNet makes it difficult to optimize, which results in its relatively low performance, as will be shown in Section V. Therefore, the ScNet was introduced in [gao2018sparsely] for complexity reduction and performance improvement.
IiiA3 ScNet
The ScNet also follows the update process in (6), but it simplifies the DetNet architecture based on the following observations:

While q contains information of H, y, and , the additional input vector v does not contain any other meaningful information. Therefore, v is removed in the ScNet architecture. As a result, is also removed. It is shown in [gao2018sparsely] that this simplification leads not only to reduced complexity and training time, but also improved performance.

Furthermore, it is observed in (6) that the first element of only depends on the first element of , which implies that the full connection between all elements of and is unnecessary. As a result, the input and output of each layer of the ScNet are directly connected in the elementwise manner. Consequently, is removed, and the weight matrix is reduced to a weight vector of size .
The operation and architecture of the ScNet is illustrated in Fig. 3. Similar to DetNet, ScNet is initialized with . Then, the output of the th layer, i.e., , is updated as follows:
(15)  
(16) 
where denotes the elementwise multiplication of and . Given , the computation of in (15) requires operations. Furthermore, because , we have , and the computation in (16) requires only operations. Consequently, the complexity of each layer of the ScNet architecture is . Taking the complexities of computing and into consideration, the ScNet architecture requires
(17) 
operations in total. Compared to the DetNet expressed in (13) and (14), it is observed that the ScNet requires much lower complexity. Furthermore, the simulation results in [gao2018sparsely] show that for BER , the ScNet achieves an approximate SNR gain of 1 dB over the DetNet.
However, one drawback of the ScNet is that the input vector x has the size of , which is three times larger than that of q in the DetNet for containing the information for y, H, and . This may result in unnecessary computational complexity of the ScNet. Furthermore, both the DetNet and ScNet employ the loss function (12) for the optimization of the weights and biases. This loss function is able to minimize the distance between s and , which allows to converge to s after a certain number of updates, which is equal to the number of layers in the DetNet and ScNet. However, it does not guarantee fast convergence. Meanwhile, if converges to s faster, a smaller number of layers can be required to achieve the same accuracy. In other words, for the same number of layers, if a better loss function is employed, then the performance can be improved. These observations on the input vector and the loss function of the DetNet and ScNet motivate us to propose the FSNet for complexity reduction and performance improvement.
IiiB Proposed FSNet architecture
IiiB1 Network architecture
Inheriting the DetNet and ScNet, the proposed FSNet is also motivated by the updating process in (6). We note that (6) can be rewritten as
(18) 
which shows that the contributions of and to are different. Therefore, their elements should be processed by different weights and biases. Furthermore, (18) also implies that the elements at the same position of and can be multiplied by the same weight. Therefore, in the proposed FSNet, we set the input vector of the th layer to
whose size is only that of in (15) for the ScNet. Furthermore, the FSNet follows the sparse connection of ScNet. Consequently, in each layer of the FSNet, there are only elementwise connections between the input and output, whereas the ScNet has .
The operation and architecture of the proposed FSNet network is illustrated in Fig. 4. Furthermore, Algorithm 1 summarizes the FSNet scheme for MIMO detection. The output of each layer is updated in step 4, where is obtained in step 3 with the requirement of operations to compute when and are given. In the FSNet architecture, we have . Therefore, the computation in each layer requires only operations. We recall that the complexities of computing and are and operations. As a result, the complexity of the entire FSNet is given as
(19) 
which is less than that of the ScNet architecture by operations and considerably lower than the complexity of the DetNet given in (13) and (14).
IiiB2 Loss function
We consider the correlation between and s in the loss function for better training the FSNet. Specifically, the loss function of the FSNet is redefined as
(20) 
where . Based on the Cauchy–Schwarz inequality, we have , where equality occurs if with a constant . Therefore, we have , and if .
The DetNet, ScNet, and FSNet schemes initialize as 0 and update over layers to approximate s. This can be considered as sequential moves starting from the coordinate 0 over to reach . By minimizing the loss function in (20), we have , or equivalently, , which enables the moves to be in a specific direction, which is the position of s in a hypersphere. This can shorten the path of the moves to reach s, which results in the reduced number of required layers in the FSNet. In (20), and are used to adjust the contributions of and to the loss function, which are optimized by simulations. Our simulation results in Section V show that the proposed FSNet achieves not only complexity reduction, but also performance improvement with respect to the conventional DetNet and ScNet schemes.
Iv Deep LearningAided Tabu Search Detection
Iva Problem formulation
The TS algorithm starts with an initial candidate and sequentially moves over candidates for iterations. In each iteration, all the nontabu neighbors of the current candidate c are examined to find the best neighbor with the smallest ML metric, i.e.,
(21) 
where consists of nontabu neighboring vectors inside alphabet with the smallest distance to c, i.e.,
(22) 
where denotes the alphabet excluding the tabu vectors kept in the tabu list , and is the minimum distance between two constellation points in a plane. Furthermore, the ML metric can be expressed as [nguyen2019qr]
(23) 
where , is the single nonzero element of and is the th column of H. In this study, we refer to as the difference position of a neighbor, in which the candidate and its neighbor are different. For example, if the current candidate is , then is the difference position for because c and x are only different with respect for the third element.
After the best neighbor is determined with (21), it becomes the candidate in the next iteration, and the determination of the best neighbor of a new candidate is performed. By this iterative manner, the final solution is determined as the best candidate visited so far, i.e.,
where is the set of all visited candidates over searching iterations.
The computational complexity of TS algorithms is proportional to the number of searching iterations, i.e., . In large MIMO systems, the number of neighbors in each iteration and the dimension of the neighboring vectors are large. Therefore, the complexity to find the best neighbor in each iteration becomes high in large MIMO systems. Furthermore, an extremely large is required to guarantee that the nearoptimal solution is found. Consequently, the complexity of the TS algorithm can be excessively high in large MIMO systems. To reduce the complexity of TS in large MIMO systems, an ET scheme can be employed. Specifically, a cutoff factor , , is used to terminate the iterative searching process early after iterations in which no better solution is found. As a result, the number of searching iterations in the TS algorithm with ET is
(24) 
where is the number of iterations after which no better solution is found. The TS algorithm without ET requires searching iterations, which is also the upper bound on the number of iterations required when ET is applied. From (24), the following remarks are noted.
Remark 1
If ET occurs, the best solution found after iterations and that found after iterations are the same, which is the final solution of the TS algorithm, i.e., . Therefore, the earlier is found, the smaller is required. This objective can be achieved by starting the moves in the TS algorithm with a good initial solution. Furthermore, if the initial solution is well taken such that it is likely to be a nearoptimal solution, then no further searching iteration is required. In this case, the complexity of the TS algorithm becomes only that involved in finding the initial solution.
Remark 2
Another approach to find the optimal solution earlier and reduce is to make efficient moves during the searching iterations in the TS algorithm. In other words, the moves can be guided so that is reached earlier.
Remark 3
With the conventional ET criterion, is fixed to . In large MIMO systems, a large is required to guarantee that the optimal solution is found, which results in a large . However, we note that once ET occurs, the further search over iterations do not result in any performance improvement while causing significant computational burden for the TS algorithm. For example, in a MIMO system with QPSK, should be used to approximately achieve the performance of the SD scheme [nguyen2019qr]. For , iterations are required before the termination, whereas was already found in the th iteration. Therefore, a more efficient ET criterion is required to reduce the complexity of the TS algorithm with ET.
Remarks 1–3 motivate us to propose a TSbased detection algorithm for complexity reduction with three design objectives: taking a good initial solution, using efficient moves in searching iterations so that is reached as soon as possible, and terminating the TS algorithm early based on an efficient ET criterion. In the next subsection, we propose the DLaided TS algorithm with the application of DL to the TS detection for those objectives.
IvB Proposed DLaided TS algorithm
The main ideas of the proposed DLaided TS algorithm can be explained as follows.
IvB1 DLaided initial solution
Unlike the conventional TS algorithms, in which the ZF, MMSE, or OSIC solution is taken as the initial solution [zhao2007tabu], the DLaided TS algorithm employs a DNN to generate the initial solution.
In this scheme, the most important task is to choose a DNN architecture that is not only able to approximate the transmitted signal vector with high accuracy, but also has low computational complexity. As discussed in Section IIB, a basic FCDNN cannot achieve high BER performance for varying channels. By contrast, among the DetNet, ScNet, and FSNet, the FSNet requires the lowest complexity while achieving the best performance, which is demonstrated in Section V. Therefore, we propose employing the FSNet to find the initial solution in the DLaided TS algorithm. Specifically, obtained in step 6 of Algorithm 1 is taken as the initial solution of the DLaided TS algorithm. As a result, the required complexity for this initialization phase is given in (19).
IvB2 Efficient moves in searching iterations
In the TS algorithm with ET, the complexity can be reduced if efficient moves are made during the iterative searching process, so that is found earlier, as discussed in Remark 2. For this purpose, we propose exploiting the difference between and , which are the output of the last layer and the final solution of the FSNet, respectively. We recall that can contain elements both inside and outside the alphabet , as observed from step 4 in Algorithm 1 and Fig. 2. By contrast, we have .
Let e denote the distance between the elements of and , i.e.,
For QAM signals, the distance between two neighboring real symbols is two. Furthermore, from Fig. 2, we have for QPSK and for 16QAM. Therefore, we have . It is observed that if , there is a high probability that the th symbol in is correctly approximated by the FSNet, i.e., . By contrast, if , there is a high probability that is an erroneous estimate, i.e., . Therefore, by examining the elements of e, we can determine the elements of with high probabilities of errors.
Example 1: Consider a MIMO system with , QPSK, and
Then, we have
which implies that and can be incorrect with high probabilities.
In the following analysis, we refer to the symbols of with high probabilities of being incorrect as the predicted incorrect symbols. Furthermore, the th element in is determined as a predicted incorrect symbol if is beyond a predefined errorthreshold , i.e., if . The errorthreshold should be inversely proportional to the SNR, which leads to , where is chosen such that . Furthermore, because , we set to
(25) 
Let denote the number of predicted incorrect symbols in ( in Example 1), and let be the set of all candidates obtained by correcting . Then we have
where is a subset of obtained by correcting elements of . The number of candidates in is given as
where is the number of combinations for choosing out of predicted incorrect symbols, and is the number of real symbols in with for QPSK, 16QAM, and 64QAM, respectively. Consequently, the total number of possible corrected vectors in is
Now, denoting as the best vector obtained by correcting , we have
(26) 
In the case of large and highorder QAM schemes, becomes large; hence, high complexity is required to find in (26).
However, we note that correcting a symbol in is equivalent to a move from to one of its neighboring vectors in an iteration of the TS algorithm. Specifically, if is likely to be wrong, then a move from to its neighbor x for the difference position of should be made. In this case, and are neighboring symbols, and for . In this manner, many of the predicted incorrect symbols can be corrected after searching iterations of the TS algorithm with high probabilities. Let be the solution found by correcting the predicted incorrect symbols of after iterations. We have
(27) 
where is given in (23), and , with being the current candidate in the th iteration, .
When is sufficiently small, the complexity involved in finding with (27) becomes much smaller than that for finding with (26), as well as that for the conventional TS algorithm with (21). This is because is a subset of , and hence, unlike the conventional TS algorithm, in the th iteration, , a reduced number of neighboring vectors are examined, which implies that the complexity required during iterations to find can be low. If the incorrect symbols are predicted with high accuracy, there is a high probability that is close to . Therefore, only a small number of further iterations are required to reach , and the total complexity of the TS algorithm can be reduced.
IvB3 Proposed ET criteria
In the conventional TS algorithm with ET, the algorithm is terminated early after iterations as given in (24), where is set to with a fixed . Remark 3 shows that this stopping criterion is inefficient. Specifically, in many cases, the further searches over iterations does not result in performance improvement while creating unnecessary complexity for the TS algorithm.
In this work, to minimize the number of redundant searching iterations, we propose an adaptive ET criterion based on . We have the following notes:

If , there is no predicted incorrect symbol in , which implies a high probability that is already the optimal solution. In this case, no further searching iterations are needed, i.e., .

If is small, there is a high probability that only a few elements of are incorrect, which can be corrected after a small number of moves in the TS algorithm. Therefore, in this case, only a small is required.

In the case that , a sufficiently large number of searching iterations should be performed to guarantee the optimal performance.
Therefore, we propose using an adaptive cutoff factor depending on as follows:
(28) 
where is optimized through simulations to guarantee that a sufficient number of iterations is used in the DLaided TS algorithm. Consequently, in the DLaided TS algorithm, only iterations are required before the searching process is terminated, with because . From (28), we have as , and as , implying the use of a smaller number of iterations for a smaller , and vice versa.
The DNNaided TS algorithm is presented in Algorithm 2. In step 1, and are obtained by the FSNet scheme in Algorithm 1, allowing e to be computed in step 2 as the difference between and . In step 4, the list including the positions of the predicted incorrect elements of is found based on , which is set in step 3. Then, is set to the size of in step 5. The DLaided TS algorithm is initialized in steps 6–11. Specifically, step 6 assigns to the current candidate c and pushes it to the tabu list. Then, the best solution and its metric are initialized as c and , respectively, in step 7. The adaptive cutoff factor and are computed in steps 8 and 9, respectively.
In steps 12–30, is searched over iterations. The first iterations are used to correct the predicted incorrect symbols in . In particular, in the th iteration, , the best candidate is found in step 15. In contrast, in the remaining iterations, the conventional searching manner is used, where all the neighbors in are examined to find , as in steps 17 and 18. Comparing 15 to 18, it is observed that the proposed searching approach based on the predicted incorrect symbols requires lower complexity than the conventional searching approach because . In steps 21–26, if a better solution is found, is updated, and at the same time, is set to zero to allow further moves to find a better solution. Otherwise, increases by one until it reaches . The following steps update the best solution and the tabu list, and step 31 concludes the final solution after the searching phase is finished.
V Simulation Results
In this section, we numerically evaluate the BER performance and computational complexities of the proposed FSNet detection architecture and the DLaided TS algorithm. In our simulations, each channel coefficient is assumed to be an i.i.d. zeromean complex Gaussian random variable with a variance of per dimension. The SNR is defined as the ratio of the average symbol power to the noise power .
Va Training DNNs
We follow the training model in [samuel2019learning] and [gao2018sparsely]. Specifically, we use the Adam optimizer [kingma2014adam], which is a variant of the stochastic gradient descent method [rumelhart1988learning, bottou2010large] for optimizing the DNNs. The DNNs are implemented by using Python with the Tensorflow library [abadi2016tensorflow] and a standard Intel i76700 processor. For the training phase, a decaying learni