Detecting Radio Frequency Interference in radio-antenna arrays with the Recurrent Neural Network algorithm

Detecting Radio Frequency Interference in radio-antenna arrays with the Recurrent Neural Network algorithm

[    [    [    [    [    [ \orgdivITPA Würzburg, \orgnameITPA, \orgaddress\stateBavaria, \countryGermany paul.r.burd@astro.uni-wuerzburg.de    Burd P.R.    K. Mannheim    T. März    J. Ringholz    A. Kappes    M. Kadler
TBDTBDTBD
TBDTBDTBD
TBDTBDTBD
Abstract

Signal artefacts due to Radio Frequency Interference (RFI) are a common nuisance in radio astronomy. Conventionally, the RFI-affected data are tagged by an expert data analyst in order to warrant data quality. In view of the increasing data rates obtained with interferometric radio telescope arrays, automatic data filtering procedures are mandatory. Here, we present results from the implementation of a RFI-detecting recurrent neural network (RNN) employing long-short term memory (LSTM) cells. For the training of the algorithm, a discrete model was used that distinguishes RFI and non-RFI data, respectively, based on the amplitude information from radio interferometric observations with the GMRT at . The performance of the RNN is evaluated by analyzing a confusion matrix. The true positive and true negative rates of the network are and , respectively. However, the overall efficiency of the network is due to the fact that a large amount non-RFI data are classified as being contaminated by RFI. Matthews correlation coefficient is  0.42 suggesting that a still more refined training model is required.

methods: data analysis – methods: numerical – radio continuum: general – radio lines: general – techniques: interferometric
\articletype

Article Type

1]P.R. Burd

1]K. Mannheim

1]T. März

1]J. Ringholz

1]A. Kappes

1]M. Kadler

\authormark

P.R. Burd et al

\corres

P.R. Burd

\presentaddress

Emil-Fischer-Strasse 31, 97074 Würzburg, Germany

\jnlcitation\cname

, , , , , and (\cyear2018), \ctitleRFI detection with a Recurrent Neural Network, \cjournalQ.J.R. Meteorol. Soc., \cvolTBA.

1 RFI mitigation and the machine-learning approach

Radio Frequency Interference (RFI) collectively denominates artefacts in the data of radio telescopes due to GPS transmitters, cell-phones, micowave ovens, pasture fences, power supply lines, thunderstorms, or similar radio emitters in the vicinity of the telescopes with their high-sensitivity receivers. RFI can spoil the data quality and impede calibration efforts, or mimic false astrophysical sources. It is therefore imperative to filter those signals before the calibration and imaging analysis can proceed, see (Fridman \BBA Baan, \APACyear2001) and  (Offringa, de Bruyn, Biehl\BCBL \BOthers., \APACyear2010). The so-called “SumThreshold” is a threshold-based and widely used hard-coded algorithm mitigating RFI, see  (Offringa, de Bruyn, Biehl\BCBL \BOthers., \APACyear2010), (Offringa, de Bruyn, Zaroubi\BCBL \BBA Biehl, \APACyear2010), and  (Peck \BBA Fenech, \APACyear2013).
Due to the random nature and diversity of the RFI signal shapes in the spatial and frequency domains, applying a fixed set of rules and cuts in data space generally do not suffice to eliminate RFI. Instead, the time-consuming effort of an expert data analyst is conventionally involved to deal with the observed complexity. Machine-learning (ML) algorithm may be superior in providing the required flexibility and efficiency. As a matter of fact, (Akeret \BOthers., \APACyear2017) and (Czech \BOthers., \APACyear2018) have recently successfully applied different models of deep neural networks (DNNs) to identify RFI in data from single-dish radio telescopes.
In this paper, we employ the Recurrent Neural Network (RNN) algorithm for RFI detection in data from an interferometric array of radio telescopes. The RNN makes best use of data where any kind of order is relevant, when equipped with a long short term memory (LSTM) cell, cf. (Hochreiter \BBA Schmidhuber, \APACyear1997). In this context the mentioned order can be a frequency (channel)-order, time-order or baseline-order. For the training of the algorithm, we used data obtained with the Giant Metre-Wave Radio Telescope (GMRT), see (Ananthakrishnan, \APACyear1995) which is heavily polluted by RFI. In Sec.2 the data processing and the training model are described and discussed. Sec.3 briefly describes the RNN architecture. The performance is discussed in Sec.4 with respect to the implications on the chosen RNN architecture and data modeling.

2 Data Processing and Training Model

The GMRT data, recorded at , with a bandwidth of and devided into 256 channels, are available in the GMRT data archive111https://naps.ncra.tifr.res.in/goa/mt/search/basicSearch under the project code and the observation numbers and . The data from the GMRT data archive are provided in FITS format. The GMRT consits of 30 antennas, thus leading to 435 baselines, being able to crate 435 visibilities at any given time step and channel. Using CASA’s python application programming interface (API), casacore, see (McMullin \BOthers., \APACyear2007), data blocks with the dimensions containing the amplitude information of the observations, are created. At this point, of the number of data blocks, with respect to the time step, are randomly selected and used to train the RNN. The remaining of the data are used to test the performance of the RNN.
It is worth mentioning, that also the information of the phase, derived from the visibility, and the differences of phase (amplitudes) with respect to the channel, baseline and/or time order can be used to find RFI and to train the RNN. However this study focuses only on the amplitudes as a first step to assess the method’s potential on this level.
We train the RNN to be sensitive to the sequence of data with respect to the channels, meaning the training block has the form

This means for this first approach we feed each time step, baseline and polarization per channel into the RNN. This becomes important when interpreting the resulting classifications of the RNN. Table 1 lists all dimensions of the axis in the data block. The training and test data set contain the same dimensions along the channel (CHAN) and baseline (BL) axis, the time step times polarization axis however, as mentioned above, is split into and of the total available dimension along this axis, respectively.

training data test data
TS POL 3395 179
CHAN 256 256
BL 435 435
Table 1: The dimensions of each axis in the data block are listed here. The training and the test data set have the same dimensions along the channel (CHAN) and baseline (BL) axis, however the timestep times polarization is split into and of the entire data set, respectively.

Before feeding the data into the RNN, the amplitudes are re-scaled between zero and one. This procedure results in a number of data blocks which corresponds to the number of time steps multiplied by the polarization where each amplitude per channel and baseline can be fed into the RNN.
To train the network, a simple model is built to label certain channels as RFI contaminated. The algorithm scans the amplitudes in each channel. Within the channel interval , the median is calculated. If in one channel the amplitude value is larger than five times the median within the neighboring range, the channel is labeled as being RFI contaminated. In this way, an array with zeros and ones is created where a zero-label denotes the RFI-free channel and a one-label denotes the RFI contaminated channel. This model is trained to the RNN to find RFIs in certain channels.

3 RNN architecture

The network is coded using the software package TensorFlow-GPU1.4.0 (TF), see (Abadi \BOthers., \APACyear2015). The implementation addresses the CUDA cores on two GeForce GTX1080 Ti boards, which were used to train the RNN with the CUDA8.0 version. We utilize tensorflow’s LSTM cell as described by (Hochreiter \BBA Schmidhuber, \APACyear1997) to implement the RNN. The RNN as a whole consists of 1024 of such LSTM cells. The sigmoid function is used as the activation function within each LSTM cell. To measure how well the RNN’s model fits the data, we deploy the TFs as cost-function which is minimized using the Adam optimizer (Kingma \BBA Ba, \APACyear2014). Figure.1 illustrates the losses during the training process. The Adam optimizer tries to find a global minimum for the cost function. As can be seen in Fig.1, a shallow minimum is found between 40 and 60 epochs. However, the result must be handled with some caution. Due to the non-linearity of the problem, the global minimum might still be outside of the range of the numerical accuracy reached after 100 iterations where we stopped for practical reasons.

Figure 1: Plot of loss function versus training epoch number. After the initially rapid drop of the losses after about 20 epochs, the minimum of the loss function is readily achieved between 30 and 60 iterations indicating successfull training.

4 Performance

The result of the RFI classification capability of the RNN is illustrated in Fig.2. The different sections in the plot can be interpreted as follows:

  • true positive (TP) classifications (magenta): correctly classified RFI

  • true negative (TN) classifications (red): correctly classified non-RFI

  • false negative (FN) classifications (blue): incorrectly classified RFI

  • false positive (FP) classifications (black): incorrectly classified non-RFI

The amount of data points in each category is summarized in the confusion matrix:

(1)

The confusion matrix is evaluated according to (Boughorbel \BOthers., \APACyear2017), (Fawcett, \APACyear2006) and (Powers, \APACyear2011). The results are summarized in Tab.2. In the following, we discuss the RNN efficiency for RFI detection:

Accuracy

The accuracy with which the network separates RFI and non-RFI signals according to following Eq.2

(2)

amounts to . The high value reflects the fact that most data points are within the TN (see Fig.2 category as seen in Eq.1). However, the value alone is not sufficient to assess the full performance.

Positive predictive value and false discovery rate

The positive predictive value (PPV) is . The PPV is defined as the fraction of the data correctly identified by the network as RFI compared to all data, including incorrectly identified non-RFI data (black and magenta illustrated data points in Fig.2), see Eq.3.

(3)

The relatively low value for the PPV and the relatively high value for the false discovery rate FDR (), which is the rate with which non-RFI signals are classified as RFI, see Eq.4

(4)

is due to the fact that for each channel all baselines, time steps and polarizations are considered. If any of those are classified to be RFI polluted, the entire channel is flagged as such. The method can obviously be further refined to improve its overall efficiency by employing a less simplistic approach.

Negative predictive value and false omission rate

The negative predictive value (NPV) is . The NPV states which data points are correctly classified as non-RFI compared to all data points which are identified as non-RFI (red and blue marked data points in Fig.2), see Eq.5.

(5)

Due to the fact that most data points are in the TN category () compared to 128 in the FN category it becomes clear the the NPV converges to one while the false omission rate (FOR), Eq.6, which is the rate with which RFI signals are not identified as such, goes to zero.

(6)

True positive and false negative rate

The true positive rate, Eq.7, is , due to the TP being two orders of magnitude larger than the FN value. It describes the network ability to successfully predict RFI.

(7)

The false negative rate (FNR), see Eq.8 on the other hand is .

(8)

True negative rate and false positive rate

The true negative rate (TNR), Eq.9, is . Similar as in the paragraph before the TN is two orders of magnitude larger than the FP value.

(9)

The false positive rate (FPR), see Eq.10, is .

(10)

Matthews correlation coefficient and F1 score

The Matthews correlation coefficient (MCC), see Eq.11, evaluates the network’s performance when dealing with sample sizes which widely differ in range, see (Matthews, \APACyear1975) and (Boughorbel \BOthers., \APACyear2017). Here, these samples are TP, FN, FP, TN.

(11)

A value of would indicate that classification and data are totally anti-correlated, a value of would be a total random classification with respect to the data, while a value of would indicate a total correlation between the classification and the data. The MCC for the RNN is . Comparing this to the accuracy it becomes clear that the MCC is a more robust way to evaluate the RNN’s performance than the accuracy by itself. In this context also the F1 score, see Eq.12 can be used to evaluate the accuracy of the RNN due to the model being binary, see (Blair, \APACyear\bibnodate) and (Powers, \APACyear2011). At a value of 0, F1 indicates worst precision while a perfect precision is indicated at a value of 1. The F1 value of this RNN’s capability to distinguish between RFI and non-RFI is .

(12)
Figure 2: The RNN classification is rescaled between zero and one and plotted against the rescaled amplitude.The colors indicate the different categories of detection. The data points colored in magenta show the TP detection. Red-colored datapoints depict the TN detection. The FN and FP detections are illustrated in blue and black, respectively. Note: the data density is thinned out in the TN region (red) by a factor of 100 and in the FN rgion (black) by a factor of 10 to reduce the size of the file while preserving the information for visual inspection.
Parameter Value

accuracy
0.9792
positive predictive value 0.1800
false discovery rate 0.8200
negative predictive value 0.9999
false omission rate
true positive rate 0.9986
false negative rate 0.0014
true negative rate 0.9791
false positive rate 0.0209
Mattew’s correlation coefficient 0.4195
F1 score 0.3051
{tablenotes}

a) Eq.2, b) Eq.3, c) Eq.4, d) Eq.5, e) Eq.6, f) Eq.7, g) Eq.8 ,h) Eq.9, i) Eq.10, j) Eq.11, k) Eq.12

Table 2: The results of the evaluation of the confusion matrix are listed here. The different parameters are calculated in accordance to (Matthews, \APACyear1975), (Blair, \APACyear\bibnodate), (Boughorbel \BOthers., \APACyear2017), (Fawcett, \APACyear2006) and (Powers, \APACyear2011).

5 Conclusion and Outlook

In Sec.4, we show that the RNN reaches an accuracy of after sufficient training. However, this seemingly high accuracy is due to the large number of data in the TN category, see Eq.1. When studying the PPV and the FDR, a weakness of the chosen method becomes apparent which lowers its overall efficiency. A large amount of data (FP category, data points), which are actually non-RFI, are classified as being RFI resulting in a PPV of . This also becomes evident when taking the MCC into account, which is , meaning the classification is not random with respect to the data, but the correlation is not strong either. When calculating the F1 score, the overall precision of the network is . An improvement of the efficiency of the method can be expected from the following refinements:

  • Data usage: In this study, we used only the amplitude information in the data. However, the amplitude differences with respect to the channles, baselines, and times steps should also be used, adding four more axes to the training data cube. In addition, the phase (spatial) information in the data could be further utilized.

  • Model complexity: The discrete amplitude-based model to distinguish RFI and non-RFI may be adjusted to cope with more complex signal shapes and strength patterns.

  • Network architecture: The network training could be extended to consider the time step, polarization and baseline sequences instead of the channel sequence only. Thus, the amount of data in the FP category will be reduced. By also adding the information on the image-level, it is possible to combine the RNN with the advantages of a CNN which would give information of prominent features in an image, giving a hierarchy of dominant features like RFI. This would result in a change of the architecture into a recurrent convolutional neural network (RCNN)

The results of this study mark an encouraging milestone and path towards a highly dynamical RFI filter meeting the challenges of future radio antenna arrays.

Acknowledgments

This research was supported by the Bayerisch-Tschechische Hochschulagentur (BTHA) under grant number BTHA-AP-2018-18

Author contributions

All authors contributed to the scientific content, the writing and editing of the manuscript. The original analysis was done by PRB, TM, and JR.

Financial disclosure

None reported.

Conflict of interest

The authors declare no potential conflict of interests.

References

  • Abadi \BOthers. (\APACyear2015) \APACinsertmetastarTF{APACrefauthors}Abadi, M., Agarwal, A., Barham, P. et al.  \APACrefYearMonthDay2015, \APACrefbtitleTensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. \PrintBackRefs\CurrentBib
  • Akeret \BOthers. (\APACyear2017) \APACinsertmetastarAkeret2017{APACrefauthors}Akeret, J., Chang, C., Lucchi, A.\BCBL \BBA Refregier, A.  \APACrefYearMonthDay2017, \APACjournalVolNumPagesAstronomy and Computing1835-39. \PrintBackRefs\CurrentBib
  • Ananthakrishnan (\APACyear1995) \APACinsertmetastarAnanthakrishnan1995{APACrefauthors}Ananthakrishnan, S.  \APACrefYearMonthDay1995, \APACjournalVolNumPagesJournal of Astrophysics and Astronomy Supplement16427. \PrintBackRefs\CurrentBib
  • Blair (\APACyear\bibnodate) \APACinsertmetastardoi:10.1002/asi.4630300621{APACrefauthors}Blair, D\BPBIC.  \APACrefYearMonthDay\bibnodate, \APACjournalVolNumPagesJournal of the American Society for Information Science306374. {APACrefDOI} \doi10.1002/asi.4630300621 \PrintBackRefs\CurrentBib
  • Boughorbel \BOthers. (\APACyear2017) \APACinsertmetastarBoughorbel2017{APACrefauthors}Boughorbel, S., Jarray, F.\BCBL \BBA El-Anbari, M.  \APACrefYearMonthDay201706, \APACjournalVolNumPagesPLOS ONE1261-17. \PrintBackRefs\CurrentBib
  • Czech \BOthers. (\APACyear2018) \APACinsertmetastarCzech2018{APACrefauthors}Czech, D., Mishra, A.\BCBL \BBA Inggs, M.  \APACrefYearMonthDay2018, \APACjournalVolNumPagesarxiv:1803.02684. \PrintBackRefs\CurrentBib
  • Fawcett (\APACyear2006) \APACinsertmetastarFawcett2006{APACrefauthors}Fawcett, T.  \APACrefYearMonthDay2006, \APACjournalVolNumPagesPattern Recognition Letters278861. \APACrefnoteROC Analysis in Pattern Recognition \PrintBackRefs\CurrentBib
  • Fridman \BBA Baan (\APACyear2001) \APACinsertmetastarFridman2001{APACrefauthors}Fridman, P\BPBIA.\BCBT \BBA Baan, W\BPBIA.  \APACrefYearMonthDay2001, \APACjournalVolNumPages\aap378327. \PrintBackRefs\CurrentBib
  • Hochreiter \BBA Schmidhuber (\APACyear1997) \APACinsertmetastarHochreiter1997{APACrefauthors}Hochreiter, S.\BCBT \BBA Schmidhuber, J.  \APACrefYearMonthDay1997, \APACjournalVolNumPagesNeural Computation981735-1780. \PrintBackRefs\CurrentBib
  • Kingma \BBA Ba (\APACyear2014) \APACinsertmetastarKingma2014{APACrefauthors}Kingma, D\BPBIP.\BCBT \BBA Ba, J.  \APACrefYearMonthDay2014\APACmonth12, \APACjournalVolNumPagesArXiv e-prints. \PrintBackRefs\CurrentBib
  • Matthews (\APACyear1975) \APACinsertmetastarmatthews{APACrefauthors}Matthews, B.  \APACrefYearMonthDay1975, \APACjournalVolNumPagesBiochimica et Biophysica Acta (BBA) - Protein Structure4052442. \PrintBackRefs\CurrentBib
  • McMullin \BOthers. (\APACyear2007) \APACinsertmetastarMcMullin2007{APACrefauthors}McMullin, J\BPBIP., Waters, B., Schiebel, D., Young, W.\BCBL \BBA Golap, K.  \APACrefYearMonthDay2007\APACmonth10, \BBOQ\APACrefatitleCASA Architecture and Applications CASA Architecture and Applications.\BBCQ \BIn R\BPBIA. Shaw, F. Hill\BCBL \BBA D\BPBIJ. Bell (\BEDS), \APACrefbtitleAstronomical Data Analysis Software and Systems XVI Astronomical Data Analysis Software and Systems XVI \BVOL 376, \BPG 127. \PrintBackRefs\CurrentBib
  • Offringa, de Bruyn, Biehl\BCBL \BOthers. (\APACyear2010) \APACinsertmetastarOffringa2010_1{APACrefauthors}Offringa, A\BPBIR., de Bruyn, A\BPBIG., Biehl, M., Zaroubi, S., Bernardi, G.\BCBL \BBA Pandey, V\BPBIN.  \APACrefYearMonthDay2010, \APACjournalVolNumPages\mnras405155. \PrintBackRefs\CurrentBib
  • Offringa, de Bruyn, Zaroubi\BCBL \BBA Biehl (\APACyear2010) \APACinsertmetastarOffringa2010_2{APACrefauthors}Offringa, A\BPBIR., de Bruyn, A\BPBIG., Zaroubi, S.\BCBL \BBA Biehl, M.  \APACrefYearMonthDay2010, \APACjournalVolNumPagesarxiv:1007.2089. \PrintBackRefs\CurrentBib
  • Peck \BBA Fenech (\APACyear2013) \APACinsertmetastarPeck2013{APACrefauthors}Peck, L\BPBIW.\BCBT \BBA Fenech, D\BPBIM.  \APACrefYearMonthDay2013\APACmonth08, \APACjournalVolNumPagesAstronomy and Computing254-66. {APACrefDOI} \doi10.1016/j.ascom.2013.09.001 \PrintBackRefs\CurrentBib
  • Powers (\APACyear2011) \APACinsertmetastarPowers2011{APACrefauthors}Powers, D\BPBIM\BPBIW.  \APACrefYearMonthDay2011, \APACjournalVolNumPagesJournal of Machine Learning Technologies2137. \PrintBackRefs\CurrentBib
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
267867
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description