Detecting Radio Frequency Interference in radioantenna arrays with the Recurrent Neural Network algorithm
Abstract
Signal artefacts due to Radio Frequency Interference (RFI) are a common nuisance in radio astronomy. Conventionally, the RFIaffected data are tagged by an expert data analyst in order to warrant data quality. In view of the increasing data rates obtained with interferometric radio telescope arrays, automatic data filtering procedures are mandatory. Here, we present results from the implementation of a RFIdetecting recurrent neural network (RNN) employing longshort term memory (LSTM) cells. For the training of the algorithm, a discrete model was used that distinguishes RFI and nonRFI data, respectively, based on the amplitude information from radio interferometric observations with the GMRT at . The performance of the RNN is evaluated by analyzing a confusion matrix. The true positive and true negative rates of the network are and , respectively. However, the overall efficiency of the network is due to the fact that a large amount nonRFI data are classified as being contaminated by RFI. Matthews correlation coefficient is 0.42 suggesting that a still more refined training model is required.
Article Type
1]P.R. Burd
1]K. Mannheim
1]T. März
1]J. Ringholz
1]A. Kappes
1]M. Kadler
P.R. Burd et al
P.R. Burd
EmilFischerStrasse 31, 97074 Würzburg, Germany
, , , , , and (\cyear2018), \ctitleRFI detection with a Recurrent Neural Network, \cjournalQ.J.R. Meteorol. Soc., \cvolTBA.
1 RFI mitigation and the machinelearning approach
Radio Frequency Interference (RFI) collectively denominates artefacts in the data of radio telescopes due to GPS transmitters,
cellphones, micowave ovens, pasture fences, power supply lines, thunderstorms, or similar radio emitters in the vicinity of the telescopes
with their highsensitivity receivers. RFI can spoil the data quality and impede calibration efforts, or mimic false astrophysical sources.
It is therefore imperative to filter those signals before the calibration and imaging analysis can proceed, see (Fridman \BBA Baan, \APACyear2001) and (Offringa, de Bruyn, Biehl\BCBL \BOthers., \APACyear2010). The socalled “SumThreshold” is a thresholdbased and widely used hardcoded algorithm mitigating RFI, see (Offringa, de Bruyn, Biehl\BCBL \BOthers., \APACyear2010), (Offringa, de Bruyn, Zaroubi\BCBL \BBA Biehl, \APACyear2010), and (Peck \BBA Fenech, \APACyear2013).
Due to the random nature and diversity of the RFI signal shapes in the spatial and frequency domains, applying a
fixed set of rules and cuts in data space generally do not suffice to eliminate RFI. Instead, the timeconsuming effort of an expert data analyst
is conventionally involved to deal with the observed complexity.
Machinelearning (ML) algorithm may be superior in providing the required flexibility and efficiency.
As a matter of fact, (Akeret \BOthers., \APACyear2017) and (Czech \BOthers., \APACyear2018) have recently successfully applied different models of deep neural networks (DNNs) to identify RFI in data from singledish radio telescopes.
In this paper, we employ the Recurrent Neural Network (RNN) algorithm for RFI detection in data from an interferometric array of radio telescopes.
The RNN makes best use of data where any kind of order is relevant, when equipped with a long short term memory (LSTM) cell, cf. (Hochreiter \BBA Schmidhuber, \APACyear1997). In this context the mentioned order can be a frequency (channel)order, timeorder or baselineorder. For the training of the algorithm, we used data obtained with the Giant MetreWave Radio Telescope (GMRT), see (Ananthakrishnan, \APACyear1995) which is heavily polluted by RFI. In Sec.2 the data processing and the training model are described and discussed. Sec.3 briefly describes the RNN architecture. The performance is discussed in Sec.4 with respect to the implications on the chosen RNN architecture and data modeling.
2 Data Processing and Training Model
The GMRT data, recorded at , with a bandwidth of and devided into 256 channels, are available in the GMRT data archive^{1}^{1}1https://naps.ncra.tifr.res.in/goa/mt/search/basicSearch under the project code and the observation numbers and . The data from the GMRT data archive are provided in FITS format. The GMRT consits of 30 antennas, thus leading to 435 baselines, being able to crate 435 visibilities at any given time step and channel.
Using CASA’s python application programming interface (API), casacore, see (McMullin \BOthers., \APACyear2007), data blocks with the dimensions
containing the amplitude information of the observations, are created.
At this point, of the number of data blocks, with respect to the time step, are randomly selected and used to train the RNN. The remaining of the data are used to test the performance of the RNN.
It is worth mentioning, that also the information of the phase, derived from the visibility, and the differences of phase (amplitudes) with respect to the channel, baseline and/or time order can be used to find RFI and to train the RNN. However this study focuses only on the amplitudes as a first step to assess the method’s potential on this level.
We train the RNN to be sensitive to the sequence of data with respect to the channels, meaning the training block has the form
This means for this first approach we feed each time step, baseline and polarization per channel into the RNN. This becomes important when interpreting the resulting classifications of the RNN. Table 1 lists all dimensions of the axis in the data block. The training and test data set contain the same dimensions along the channel (CHAN) and baseline (BL) axis, the time step times polarization axis however, as mentioned above, is split into and of the total available dimension along this axis, respectively.
training data  test data  

TS POL  3395  179 
CHAN  256  256 
BL  435  435 
Before feeding the data into the RNN, the amplitudes are rescaled between zero and one. This procedure results in a number of data blocks which corresponds to the number of time steps multiplied by the polarization where each amplitude per channel and baseline can be fed into the RNN.
To train the network, a simple model is built to label certain channels as RFI contaminated. The algorithm scans the amplitudes in each channel. Within the channel interval , the median is calculated. If in one channel the amplitude value is larger than five times the median within the neighboring range, the channel is labeled as being RFI contaminated. In this way, an array with zeros and ones is created where a zerolabel denotes the RFIfree channel and a onelabel denotes the RFI contaminated channel. This model is trained to the RNN to find RFIs in certain channels.
3 RNN architecture
The network is coded using the software package TensorFlowGPU1.4.0 (TF), see (Abadi \BOthers., \APACyear2015). The implementation addresses the CUDA cores on two GeForce GTX1080 Ti boards, which were used to train the RNN with the CUDA8.0 version. We utilize tensorflow’s LSTM cell as described by (Hochreiter \BBA Schmidhuber, \APACyear1997) to implement the RNN. The RNN as a whole consists of 1024 of such LSTM cells. The sigmoid function is used as the activation function within each LSTM cell. To measure how well the RNN’s model fits the data, we deploy the TFs as costfunction which is minimized using the Adam optimizer (Kingma \BBA Ba, \APACyear2014). Figure.1 illustrates the losses during the training process. The Adam optimizer tries to find a global minimum for the cost function. As can be seen in Fig.1, a shallow minimum is found between 40 and 60 epochs. However, the result must be handled with some caution. Due to the nonlinearity of the problem, the global minimum might still be outside of the range of the numerical accuracy reached after 100 iterations where we stopped for practical reasons.
4 Performance
The result of the RFI classification capability of the RNN is illustrated in Fig.2. The different sections in the plot can be interpreted as follows:

true positive (TP) classifications (magenta): correctly classified RFI

true negative (TN) classifications (red): correctly classified nonRFI

false negative (FN) classifications (blue): incorrectly classified RFI

false positive (FP) classifications (black): incorrectly classified nonRFI
The amount of data points in each category is summarized in the confusion matrix:
(1) 
The confusion matrix is evaluated according to (Boughorbel \BOthers., \APACyear2017), (Fawcett, \APACyear2006) and (Powers, \APACyear2011). The results are summarized in Tab.2. In the following, we discuss the RNN efficiency for RFI detection:
Accuracy
The accuracy with which the network separates RFI and nonRFI signals according to following Eq.2
(2) 
Positive predictive value and false discovery rate
The positive predictive value (PPV) is . The PPV is defined as the fraction of the data correctly identified by the network as RFI compared to all data, including incorrectly identified nonRFI data (black and magenta illustrated data points in Fig.2), see Eq.3.
(3) 
The relatively low value for the PPV and the relatively high value for the false discovery rate FDR (), which is the rate with which nonRFI signals are classified as RFI, see Eq.4
(4) 
is due to the fact that for each channel all baselines, time steps and polarizations are considered. If any of those are classified to be RFI polluted, the entire channel is flagged as such. The method can obviously be further refined to improve its overall efficiency by employing a less simplistic approach.
Negative predictive value and false omission rate
The negative predictive value (NPV) is . The NPV states which data points are correctly classified as nonRFI compared to all data points which are identified as nonRFI (red and blue marked data points in Fig.2), see Eq.5.
(5) 
Due to the fact that most data points are in the TN category () compared to 128 in the FN category it becomes clear the the NPV converges to one while the false omission rate (FOR), Eq.6, which is the rate with which RFI signals are not identified as such, goes to zero.
(6) 
True positive and false negative rate
True negative rate and false positive rate
Matthews correlation coefficient and F1 score
The Matthews correlation coefficient (MCC), see Eq.11, evaluates the network’s performance when dealing with sample sizes which widely differ in range, see (Matthews, \APACyear1975) and (Boughorbel \BOthers., \APACyear2017). Here, these samples are TP, FN, FP, TN.
(11) 
A value of would indicate that classification and data are totally anticorrelated, a value of would be a total random classification with respect to the data, while a value of would indicate a total correlation between the classification and the data. The MCC for the RNN is . Comparing this to the accuracy it becomes clear that the MCC is a more robust way to evaluate the RNN’s performance than the accuracy by itself. In this context also the F1 score, see Eq.12 can be used to evaluate the accuracy of the RNN due to the model being binary, see (Blair, \APACyear\bibnodate) and (Powers, \APACyear2011). At a value of 0, F1 indicates worst precision while a perfect precision is indicated at a value of 1. The F1 value of this RNN’s capability to distinguish between RFI and nonRFI is .
(12) 
Parameter  Value 

accuracy 
0.9792 
positive predictive value  0.1800 
false discovery rate  0.8200 
negative predictive value  0.9999 
false omission rate  
true positive rate  0.9986 
false negative rate  0.0014 
true negative rate  0.9791 
false positive rate  0.0209 
Mattew’s correlation coefficient  0.4195 
F1 score  0.3051 
a) Eq.2, b) Eq.3, c) Eq.4, d) Eq.5, e) Eq.6, f) Eq.7, g) Eq.8 ,h) Eq.9, i) Eq.10, j) Eq.11, k) Eq.12
5 Conclusion and Outlook
In Sec.4, we show that the RNN reaches an accuracy of after sufficient training. However, this seemingly high accuracy is due to the large number of data in the TN category, see Eq.1. When studying the PPV and the FDR, a weakness of the chosen method becomes apparent which lowers its overall efficiency. A large amount of data (FP category, data points), which are actually nonRFI, are classified as being RFI resulting in a PPV of . This also becomes evident when taking the MCC into account, which is , meaning the classification is not random with respect to the data, but the correlation is not strong either. When calculating the F1 score, the overall precision of the network is . An improvement of the efficiency of the method can be expected from the following refinements:

Data usage: In this study, we used only the amplitude information in the data. However, the amplitude differences with respect to the channles, baselines, and times steps should also be used, adding four more axes to the training data cube. In addition, the phase (spatial) information in the data could be further utilized.

Model complexity: The discrete amplitudebased model to distinguish RFI and nonRFI may be adjusted to cope with more complex signal shapes and strength patterns.

Network architecture: The network training could be extended to consider the time step, polarization and baseline sequences instead of the channel sequence only. Thus, the amount of data in the FP category will be reduced. By also adding the information on the imagelevel, it is possible to combine the RNN with the advantages of a CNN which would give information of prominent features in an image, giving a hierarchy of dominant features like RFI. This would result in a change of the architecture into a recurrent convolutional neural network (RCNN)
The results of this study mark an encouraging milestone and path towards a highly dynamical RFI filter meeting the challenges of future radio antenna arrays.
Acknowledgments
This research was supported by the BayerischTschechische Hochschulagentur (BTHA) under grant number BTHAAP201818
Author contributions
All authors contributed to the scientific content, the writing and editing of the manuscript. The original analysis was done by PRB, TM, and JR.
Financial disclosure
None reported.
Conflict of interest
The authors declare no potential conflict of interests.
References
 Abadi \BOthers. (\APACyear2015) \APACinsertmetastarTF{APACrefauthors}Abadi, M., Agarwal, A., Barham, P. et al. \APACrefYearMonthDay2015, \APACrefbtitleTensorFlow: LargeScale Machine Learning on Heterogeneous Systems. TensorFlow: LargeScale Machine Learning on Heterogeneous Systems. \PrintBackRefs\CurrentBib
 Akeret \BOthers. (\APACyear2017) \APACinsertmetastarAkeret2017{APACrefauthors}Akeret, J., Chang, C., Lucchi, A.\BCBL \BBA Refregier, A. \APACrefYearMonthDay2017, \APACjournalVolNumPagesAstronomy and Computing183539. \PrintBackRefs\CurrentBib
 Ananthakrishnan (\APACyear1995) \APACinsertmetastarAnanthakrishnan1995{APACrefauthors}Ananthakrishnan, S. \APACrefYearMonthDay1995, \APACjournalVolNumPagesJournal of Astrophysics and Astronomy Supplement16427. \PrintBackRefs\CurrentBib
 Blair (\APACyear\bibnodate) \APACinsertmetastardoi:10.1002/asi.4630300621{APACrefauthors}Blair, D\BPBIC. \APACrefYearMonthDay\bibnodate, \APACjournalVolNumPagesJournal of the American Society for Information Science306374. {APACrefDOI} \doi10.1002/asi.4630300621 \PrintBackRefs\CurrentBib
 Boughorbel \BOthers. (\APACyear2017) \APACinsertmetastarBoughorbel2017{APACrefauthors}Boughorbel, S., Jarray, F.\BCBL \BBA ElAnbari, M. \APACrefYearMonthDay201706, \APACjournalVolNumPagesPLOS ONE126117. \PrintBackRefs\CurrentBib
 Czech \BOthers. (\APACyear2018) \APACinsertmetastarCzech2018{APACrefauthors}Czech, D., Mishra, A.\BCBL \BBA Inggs, M. \APACrefYearMonthDay2018, \APACjournalVolNumPagesarxiv:1803.02684. \PrintBackRefs\CurrentBib
 Fawcett (\APACyear2006) \APACinsertmetastarFawcett2006{APACrefauthors}Fawcett, T. \APACrefYearMonthDay2006, \APACjournalVolNumPagesPattern Recognition Letters278861. \APACrefnoteROC Analysis in Pattern Recognition \PrintBackRefs\CurrentBib
 Fridman \BBA Baan (\APACyear2001) \APACinsertmetastarFridman2001{APACrefauthors}Fridman, P\BPBIA.\BCBT \BBA Baan, W\BPBIA. \APACrefYearMonthDay2001, \APACjournalVolNumPages\aap378327. \PrintBackRefs\CurrentBib
 Hochreiter \BBA Schmidhuber (\APACyear1997) \APACinsertmetastarHochreiter1997{APACrefauthors}Hochreiter, S.\BCBT \BBA Schmidhuber, J. \APACrefYearMonthDay1997, \APACjournalVolNumPagesNeural Computation9817351780. \PrintBackRefs\CurrentBib
 Kingma \BBA Ba (\APACyear2014) \APACinsertmetastarKingma2014{APACrefauthors}Kingma, D\BPBIP.\BCBT \BBA Ba, J. \APACrefYearMonthDay2014\APACmonth12, \APACjournalVolNumPagesArXiv eprints. \PrintBackRefs\CurrentBib
 Matthews (\APACyear1975) \APACinsertmetastarmatthews{APACrefauthors}Matthews, B. \APACrefYearMonthDay1975, \APACjournalVolNumPagesBiochimica et Biophysica Acta (BBA)  Protein Structure4052442. \PrintBackRefs\CurrentBib
 McMullin \BOthers. (\APACyear2007) \APACinsertmetastarMcMullin2007{APACrefauthors}McMullin, J\BPBIP., Waters, B., Schiebel, D., Young, W.\BCBL \BBA Golap, K. \APACrefYearMonthDay2007\APACmonth10, \BBOQ\APACrefatitleCASA Architecture and Applications CASA Architecture and Applications.\BBCQ \BIn R\BPBIA. Shaw, F. Hill\BCBL \BBA D\BPBIJ. Bell (\BEDS), \APACrefbtitleAstronomical Data Analysis Software and Systems XVI Astronomical Data Analysis Software and Systems XVI \BVOL 376, \BPG 127. \PrintBackRefs\CurrentBib
 Offringa, de Bruyn, Biehl\BCBL \BOthers. (\APACyear2010) \APACinsertmetastarOffringa2010_1{APACrefauthors}Offringa, A\BPBIR., de Bruyn, A\BPBIG., Biehl, M., Zaroubi, S., Bernardi, G.\BCBL \BBA Pandey, V\BPBIN. \APACrefYearMonthDay2010, \APACjournalVolNumPages\mnras405155. \PrintBackRefs\CurrentBib
 Offringa, de Bruyn, Zaroubi\BCBL \BBA Biehl (\APACyear2010) \APACinsertmetastarOffringa2010_2{APACrefauthors}Offringa, A\BPBIR., de Bruyn, A\BPBIG., Zaroubi, S.\BCBL \BBA Biehl, M. \APACrefYearMonthDay2010, \APACjournalVolNumPagesarxiv:1007.2089. \PrintBackRefs\CurrentBib
 Peck \BBA Fenech (\APACyear2013) \APACinsertmetastarPeck2013{APACrefauthors}Peck, L\BPBIW.\BCBT \BBA Fenech, D\BPBIM. \APACrefYearMonthDay2013\APACmonth08, \APACjournalVolNumPagesAstronomy and Computing25466. {APACrefDOI} \doi10.1016/j.ascom.2013.09.001 \PrintBackRefs\CurrentBib
 Powers (\APACyear2011) \APACinsertmetastarPowers2011{APACrefauthors}Powers, D\BPBIM\BPBIW. \APACrefYearMonthDay2011, \APACjournalVolNumPagesJournal of Machine Learning Technologies2137. \PrintBackRefs\CurrentBib