Statistical Learning in Automated Troubleshooting: Application to LTE Interference Mitigation
This paper presents a method for automated healing as part of off-line automated troubleshooting. The method combines statistical learning with constraint optimization. The automated healing aims at locally optimizing radio resource management (RRM) or system parameters of cells with poor performance in an iterative manner. The statistical learning processes the data using Logistic Regression (LR) to extract closed form (functional) relations between Key Performance Indicators (KPIs) and Radio Resource Management (RRM) parameters. These functional relations are then processed by an optimization engine which proposes new parameter values. The advantage of the proposed formulation is the small number of iterations required by the automated healing method to converge, making it suitable for off-line implementation. The proposed method is applied to heal an Inter-Cell Interference Coordination (ICIC) process in a 3G Long Term Evolution (LTE) network which is based on soft-frequency reuse scheme. Numerical simulations illustrate the benefits of the proposed approach.
Keywords: Statistical learning, Logistic Regression, automated healing, troubleshooting, Inter-cellular Interference Coordination, LTE.
Efficient management of future Beyond 3G and 4G networks is a major challenge for network operators . The wireless ecosystem is becoming more and more heterogeneous with co-existing/co-operating technologies and deployment scenarios (i.e. macro, micro, pico and femto cell structures). Fault management or troubleshooting is an important building block of network operation.
Troubleshooting comprises three functionalities: fault detection
(i.e. detecting failures or poor performance as soon as they occur); fault diagnosis
(i.e. determining the cause of failure or of poor performance), and fault recovery or
healing (i.e. repairing the problem) .
The importance of efficient fault management has motivated the development of automated methods and tools for diagnosis and healing. In this work, the main focus is given to automated healing. It is supposed that a given cell with poor performance has been identified (fault detection) and the cause of the degraded performance has been diagnosed as a bad setting of a specific Radio Resource Management (RRM) or a system parameter (fault diagnosis) . The automated healing process aims at locally optimizing the value of this parameter, taking into account the Key Performance Indicators (KPIs) of the faulty cell and those of its neighbours. In other words, the RRM parameter that is found as the fault cause by the diagnosis process is optimized by the automated healing module.
Local type of optimization or ”steered optimization” has been studied in the literature, based on combinatorial optimization in conjunction with the interference matrix to tackle local problems detected in the network . This approach uses a network simulator and can be implemented as an advanced functionality of a cell planning tool. The focus of this paper is to develop an automated healing method based on measurements. More precisely, our aim is to conceive an automated healing method that uses statistical learning of measured data and constraint optimization. The method is denoted as Statistical Learning Automated Healing (SLAH). The SLAH module can be located at the management plane, e.g. in the Operation and Maintenance Centre (OMC) where abundant data is available. The method is iterative with a time resolution of a day, and should therefore converge in a few iterations. To achieve this requirement, a statistical learning approach using Logistic Regression (LR) is proposed that extracts the functional relations between KPIs and RRM parameters and comprises the statistical model. It is noted that the data is noisy due to the random character of the traffic and of the radio channel, but also due to imprecisions of the measurements. After each iteration, the statistical model is updated using the additional data and its precision is improved. The model is then introduced into the optimization engine and is processed directly to derive the next RRM parameter. This approach has the merit of converging rapidly. The performance of the SLAH is evaulated on an interference mitigation use case, namely an Inter-Cell Interference Coordination (ICIC) problem in a LTE network.The choice of this use case is motivated by the importance of interference mitigation in OFDMA (LTE/WiMax) networks, since it allows to improve the system performance, and particularly, to reach the strict requirements for cell edge bit rates (defined in B3G and 4G network standards, such as ). In this context, ICIC is one of the efficient approaches to mitigate interference. Interference mitigation techniques such as ICIC can considerably improve Signal to Interference plus Noise (SINR) and hence bit rates, particularly at cell edge. As a result, better network performance and user Quality of Service (QoS) are achieved, including reduced File Transfer Time (FTT), Block Call Rate (BCR) and Drop Call Rate (DCR). Different interference mitigation methods have been proposed for OFDMA systems, such as fractional reuse and soft reuse schemes -. When different power allocation for the mobile users is associated with different portions of the frequency bandwidth, the frequency reuse is called a soft reuse scheme, and will be considered here in the context of automated healing.
The paper is organized as follows: Section II introduces the concept and the system model for the SLAH and explains its different building blocks. Section III describes the LTE ICIC model that is used in the SLAH case study. The adaptation of the SLAH to heal the ICIC process is developed in Section IV. Numerical results are presented in Section V followed by concluding remarks in Section VI.
Ii System Model for Automated Healing
It is assumed that the fault cause has been diagnosed as a specific RRM parameter (such as handover/mobility, admission and congestion control thresholds) whose value has degraded the performance of the eNodeB (eNB). An example of such a case is presented in  where the bad setting of the add/drop window of a NodeB in a UMTS system is diagnosed. The purpose of the SLAH is to iteratively optimize the value of this RRM parameter using local information from the eNB and its neighbors. Hence, the automated healing is a local optimization process. The SLAH block diagram is presented in Figure 1.
The system model comprises four blocks:
Initialization block: The Initialization block provides the initial RRM parameter to the faulty eNB in the Network/Simulator block and to the Statistical Learning block.
Network/Simulator block: The Network/Simulator block represents the real network or the network simulator. It measures (case of real network) or calculates (case of network simulator) a set of KPIs of an eNB and of its neighbors for each new RRM parameter introduced by the Initialization or the Optimization block.
Statistical Learning block: The Statistical Learning block extracts the functional relations, known as the statistical model, between the KPIs and the RRM parameter through Logistic Regression (LR) . LR fits the data into the functional form denoted as : . The can describe saturation effects at its extremities as often encountered in KPIs in communication networks.
Let denote the sample value of the dependent variable (i.e. the KPI) corresponding to the sample value of the explanatory variable (i.e. the RRM parameter). LR models as follows:
where is the linear predictor representing the contribution of the explanatory variable sample , and is the the residual error. The s are the regression coefficients whose values have to be estimated using maximum likelihood estimation . Hence from (1), the functional relation between , i.e., estimated by LR, and can be written as:
Optimization block: The Optimization block calculates the optimal RRM value using the current statistical model. It determines , i.e., the value of the RRM parameter that minimizes a cost function of a set of KPIs denoted as the optimization set , subject to constraints on a second set of KPIs denoted as the constraint set . Considering that has the functional relation form as in (2), the optimization problem can be formulated as:
where is the cost function and is the weight given to .
The automated healing process is iterative. At each iteration, a new RRM parameter value is proposed (by the Initialization block during the initialization iterations and by the Optimization block during the optimization iterations) to update the RRM setting of the faulty eNB in the Network/Simulator block. The performance of the faulty eNB and of its neighbors with this new RRM value is assessed by the Network/Simulator block through a set of KPI values obtained at the end of the measurement period (typically one day). Thus, a data point comprising a RRM parameter value and the corresponding KPIs is obtained. This data point together with the previously obtained data points are used by the Statistical Learning block to refine the statistical model which is then used by the Optimization block to generate the RRM parameter value of the next iteration. Thus, as the iterations progress, on the average, the model precision improves and is used by the Optimization block to find a better value for the RRM parameter.
Iii System Model for Interference Mitigation
The performance of the proposed automated healing method is evaluated on an ICIC scheme which uses soft-frequency reuse. Consider a downlink ICIC scheme that combines two resource allocation mechanisms: Physical Resource Block (PRB) allocation to frequency subbands and coordinated power allocation. In the soft-reuse one scheme, the total available bandwidth is reused in all the cells while the transmitted power for a portion of the bandwidth of a cell can be adapted to solve interference related QoS problems. Figure 2
presents the power-frequency allocation model in a seven adjacent cell layout.
The frequency band is divided into three disjoint subbands.
One subband is allocated to mobiles with the worst signal quality and is denoted interchangeably as a protected band or as an edge band with transmit power .
A user with poor radio conditions is often situated at the cell edge, but could also be closer to the base station and experience deep shadow fading.
The remaining two frequency subbands are denoted as centre bands with transmit power reduced by a factor , namely .
The interference produced by an eNB to its neighbours can be controlled by the parameter of this eNB.
The main downlink interference in the system originates from eNB transmissions on the centre band (to centre cell users) which interfere with neighboring cell edge users utilizing their edge (protected) band.
When an eNB strongly interferes with the users of its neighbours, the ICIC mechanism allows to reduce the transmission power for the centre band.
Resource block allocation is performed based on a priority scheme for accessing the protected subbands. A quality metric is calculated using pilot channel signal strengths as . Here stands for the serving eNB of user , denotes the mean pilot power received by the user of a signal transmitted by the eNB , and is the noise power spectral density. is similar to the SINR with the difference that in the present ICIC scheme, the data channels used to calculate the SINR are subject to power control. The metric is calculated for all users which are then sorted according to this metric. Users with the worst are allocated resources from the protected band and benefit from maximal transmission power of the eNB. When the protected subband is full, the resource block allocation continues from the centre band.
Note that the soft-reuse ICIC scheme is characterized by two other parameters in addition to : 1) the number of PRBs assigned to the center and edge bands; 2) the threshold that determines the boundary between center and cell edge users. In this work, for reasons of simplicity, we deal only with one parameter. The proposed algorithm can easily be generalized to multiple RRM case, however with an increased complexity. The choice of the parameter is motivated by the simplicity in its implementation, which is carried out by a simple power control on a pre-defined set of subcarriers while the other two parameters require modifications of the PRB scheduling strategy. Nevertheless, the proposed algorithm is equally applicable to the other two parameters without any major alterations.
Iv SLAH for Interference Mitigation
This section describes the adaptation of the SLAH to interference mitigation in a LTE network by locally optimizing the parameter of the interfering eNBs.
Denote by ( standing for ) an eNB with degraded performance.
It is assumed that the cause of the degraded performance has been diagnosed and is related to excessive inter-cell interference which can be effectively mitigated by a soft-reuse ICIC scheme.
The first tier neighbours of are denoted by , where is the index set of the first-tier neighbours of .
The specificity of the interference mitigation use case is the following: to heal , the parameters of , , are optimized, while of remains unchanged.
We use the notion of coupling between and which is expressed in terms of the interference that produces on the users connected to and can be written in terms of the downlink interference matrix element . Hence the bigger the , the stronger the coupling between the two eNBs. Note that the matrix element is equal to the time average of the sum of interferences perceived by the mobiles attached to and generated by downlink transmissions to the mobiles of . Denote by , , the index of the eNB which is the most coupled with , namely , . To reduce the complexity of the SLAH process, we propose to adjust the parameter according to the degree of coupling between and . Hence, we define a functional relation between and that accounts for the coupling mentioned above:
Note that the smaller the coupling between and , the lesser the power reduction applied to .
Thus, by using (4), only needs to be optimized instead of all the first tier s.
The self-healing process can be performed simultaneously on any number of eNBs provided they are not direct neighbours.
The SLAH aims at minimizing the FTT for and of its first-tier neighbours while verifying constraints on their BCRs (, ). We define the cost function for the optimization as follows:
It is noted that is a function of and hence, via equation (4), of . The weighting coefficients depend on the relative contribution of with respect to the sum on all eNBs in and are given by:
satisfying the condition . The optimization problem can now be formulated as follows:
subject to .
is the threshold for above which communication quality is unacceptably poor.
The and indicators in equations (5) and (7), are given in the form of LR function (2) obtained using the LR.
The SLAH can be further improved by introducing a generalized interference matrix element in equation (4) by introducing an additional KPI, namely the BCR:
One can see that the higher the (i.e. the normalized ), the smaller the and consequently, the smaller the modification of .
The significance of equation (8) is that, in order to improve the performance of , the decrease made in due to is limited by the degradation in .
Note that the constant allows tuning the effect of .
Denote a as the vector , where and denotes the iteration index. Since the SLAH starts with an initial data point and generates a new data point at each iteration, the iteration index equals the total number of generated data points. The set of data points for , , is denoted by . The SLAH algorithm is given in Table I.
|1. Identify the most coupled eNB with among the|
|2. Generate the initial set of data points , , by applying|
|different values (together with the associated values) to the|
|network/simulator one by one and obtaining the corresponding KPIs.|
|Repeat until convergence:|
|3. For each , compute the statistical model using LR for and|
|using the corresponding data points in|
|4. Compute a new vector containing the new values of ,|
|(using equations (4) and (7))|
|5. Apply the new values to the network/simulator and observe ()|
|and (), . Compute the new data point|
|6. Update :|
V Case Study
V-a Simulation Scenario
A LTE network comprising 45 eNBs in a dense urban environment and having bandwidth of 5MHz is simulated using the MATLAB simulator described in . The simulator performs correlated Monte Carlo snapshots with a time resolution of 1 second to account for the time evolution of the network. FTP traffic with a file size of 6300 Kbits is considered. Call arrivals are generated using the Poisson process and the communication duration of each user depends on its bit rate. The maximum number of PRBs in an eNB, i.e. the capacity, is fixed to 24 PRBs with 8 PRBs in each sub-band. The number of PRBs that can be allocated to a user can vary from 1 to 4, allocated on the first-come first-served basis. No mobility is taken into account.
For each new value of , the simulator runs for 2500 time steps (seconds) to allow the convergence of the processed KPIs. The BCR and FTT KPIs used by the SLAH algorithm are averaged on an interval varying from 500 to 2500 seconds while discarding the samples of first 500 seconds during which the network reaches a steady state. It is noted that for a given traffic demand, the BCR provides a capacity indicator while the FTT is more related to the user perceived QoS. The simulated LTE system includes a simple admission control process based on signal strength: A simple admission control has been implemented based on signal strength. A mobile selects the eNB with the highest Reference Signal Received Power (RSRP) and is admitted if it is above -104 dBm and if at least one PRB is available. The mobile throughput is calculated from SINR using quality tables obtained from link level simulations. The SINR and consequently the bit rate of a mobile are updated after each simulation time step. The interference matrix elements used in equations (4), (6) and (8) are calculated only once for the reference solution (see paragraph below) during a longer time interval varying from 500 to 7000 seconds to achieve accurate average results.
An optimal default value for , known as reference solution, is calculated as 0.5 for all eNBs in the network. The default value is determined by varying it simultaneously for all eNBs from 0.0125 to 1 in steps of 0.0125. For each , the network performance is assessed in terms of the mean BCR and mean FTT. The minimum values for both BCR and FTT are obtained in the interval . The value of is selected as the default value due to the smaller inter-cellular interference and the minimum energy consumption in the network.
V-B Automated Healing Scenario
A problematic eNB with the worst performance in the simulated network (in terms of BCR and FTT), namely , is selected for automated healing using the SLAH algorithm. The , where , is one of the six first tier neighbours of . is fixed to the reference default value of 0.5. The index set of the second tier neighbours of the problematic eNB consists of . Denote by optimization zone the subnetwork comprising and its first tier neighbours , and by evaluation zone the subnetwork comprising and its first two tier neighbours and . The is the eNB most coupled with .
The SLAH algorithm is applied using the generalized interference matrix (8) in (4), with =-0.3. The first five values of in Table II are chosen for the initialization phase (Phase-I in the Table) of the SLAH. The next seven values are calculated iteratively by the SLAH algorithm during the optimization phase (Phase-II in the Table). The values of , , , and are calculated using equation (4). In spite of the inherent noise present in the generated data, one can see from the values depicted in Phase-II that converges in a few iterations. is chosen as the optimized solution.
show the mean BCR and FTT data points respectively as a function of together with the LR curves for , and after convergence (at the end of the optimization iteration).
The KPI curves for , , and are not shown as they have a similar trend.
The concentration of KPI data points around indicates the convergence of the SLAH algorithm.
shows the gain brought about by the SLAH algorithm for the optimization zone (set of eNBs).
The mean BCR of the problematic is reduced by 45% with respect to the reference solution, from 5.28% to 2.9%.
The average improvement of the mean BCR in the first tier () is 44% with respect to the reference solution.
In the case of mean FTT, the improvement brought about by the SLAH algorithm in the optimized zone with respect to the reference solution is shown in Figure 4. The mean FTT of is reduced by 6.31% and the average improvement of the mean FTT in the first tier is 26.6%. This improvement is related to the optimized interference management in the first tier of the problematic eNB. The decrease in interferences improves the SINR values and consequently the bit rates and the FTT values. Furthermore, the improvement in power resource allocation decreases the sojourn time of users that monopolize scarce radio resources and results in the improvement in BCR.
Figures 5 and 5
show, in descending order, the mean BCR and the mean FTT respectively for the reference (square) and the optimized (circle) eNBs in the evaluation zone (). It is noted that the order of the stations in the two curves of each Figure may not be preserved. One can see that on the average, the mean BCR and mean FTT in the evaluation zone are improved. The average improvement of FTT in the evaluation zone is of 13%.
This paper has presented a new approach, the SLAH, for automated healing of cells with poor performance. The SLAH is an iterative optimization algorithm that uses statistical learning in conjunction with a simple optimization module. During each iteration, the RRM solution computed by optimization block is improved jointly with the improvement in the statistical model. The SLAH can be implemented in the management plane, e.g. in the OMC in an off-line mode. It has been successfully applied to heal a downlink ICIC parameter of an eNB with degraded performance due to excess downlink inter-cell interference in a LTE network. The proposed approach has several attractive features: it is generic and can be easily adapted to deal with different types of faulty parameters; it performs well in the presence of noisy data; and it converges in a very small number of iterations. The SLAH method provides the basis for designing self-healing algorithms.
-  G. Dimitrakopoulos, K. Tsagkaris, V. Stavroulaki, A. Katidiotis, N. Koutsouris, P. Demestichas, V. Merat, S. Walter,: A Management Framework for Ambient Systems Operating in Wireless B3G Environments, Mobile Network and Applications Journal vol. 13, no. 6, Dec. 2008, pp. 555-568.
-  R. Barco, V. Wille, and L. Diez,: System for automatic diagnosis in cellular networks based on performance indicators, European Trans. Telecommunications, vol. 16, no. 5, Oct. 2005, pp. 399-409.
-  R. Barco, L. Nielsen, R. Guerrero, G. Hylander, and S. Patel,: Automated troubleshooting of a mobile communication network using bayesian networks, in Proc. IEEE International Workshop on Mobile and Wireless Communications Networks (MWCN’02), Stockholm, Sweden, Sept. 2002, pp. 606-610.
-  S. Ben Jamaa, Z. Altman, J.-M. Picard, A. Ortega,: Steered Optimization Strategy for Automatic Cell Planning of UMTS Networks, IEEE 61st Vehicular Technology Conference, 2005 Spring.
-  3GPP TR 36.913 v8.0.1, ”Requirements for Further Advancements for E-UTRA (LTE-Advanced)”, March 2009.
-  IST Winner II project, Interference avoidance concept, Deliverable D4.7.2, June 2007, http://www.ist-winner.org/WINNER2-Deliverables/D4.7.2.pdf.
-  G. Fodor, Performance analysis of a reuse partitioning technique for OFDM based evolved UTRA, IEEE Int. Workshop on QoS, June 2006, pp. 112-120.
-  R. Y. Chang, Z. Tao, J. Zhang and C.-C. J. Kuo, Multicell OFDMA Downlink Resource Allocation Using a Graphic Framework, IEEE Trans. Vehicular, Vol. 58, No. 7, September 2009, pp. 3494-3507.
-  W.D. Hosmer, S. Lemeshow,: Applied Logistic Regression, 2nd Edition, Chichester, Wiley, New York, 2000, pp. 1-30.
-  A.J. Dobson,: Introduction to Generalized Linear Models, Second Edition, Chapman and Hall/CRC, London, 2001, pp. 57-67.
-  R. Nasri, Z. Altman,: Handover Adaptation for Dynamic Load Balancing in 3GPP Long Term Evolution Systems., MoMM’2007-The Fifth International Conference on Advances in Mobile Computing and Multimedia, 3-5 December 2007, pp. 145-154.