Recurrent Neural Networks for anomaly detection in the Post-Mortem time series of LHC superconducting magnets
This paper presents a model based on Deep Learning algorithms of LSTM and GRU for facilitating an anomaly detection in Large Hadron Collider superconducting magnets. We used high resolution data available in Post Mortem database to train a set of models and chose the best possible set of their hyper-parameters. Using Deep Learning approach allowed to examine a vast body of data and extract the fragments which require further experts examination and are regarded as anomalies. The presented method does not require tedious manual threshold setting and operator attention at the stage of the system setup. Instead, the automatic approach is proposed, which achieves according to our experiments accuracy of . This is reached for the largest dataset of 302 MB and the following architecture of the network: single layer LSTM, 128 cells, 20 epochs of training, look_back=16, look_ahead=128, grid=100 and optimizer Adam. All the experiments were run on GPU Nvidia Tesla K80.
keywords:LHC, Deep Learning, LSTM, GRU
The Large Hadron Collider (LHC)located at the European Organization for Nuclear Research (CERN)on Switzerland and France border is the largest experimental instrument which was ever built LHC_Nature (). It generates a tremendous amount of data which is later used in analysis and validation of the physics models regarding the history of the universe and the nature of the matter. Besides the data used in physics experiments, the data from the multitude of devices installed inside the LHC, such as ones responsible for a particle beam trajectory control and stabilization of the LHCoperating parameters, is gathered. To work efficiently, those devices need to be constantly monitored and maintained and their operating parameters analyzed. As a result, each of those devices can be considered a separate system, with its own sensors and elements responsible for work control. This architecture leads to a great number of data streams depicting various systems’ condition.
Some of the most vulnerable LHCcomponents are superconducting magnets. They are unique elements, designed and manufactured specially for the CERN, which is why controlling their operating parameters and preventing malfunctions and failures is so important. In the CERNhistory, occurrences such as cern_crash () took place, which resulted in a damage to those valuable components. As a consequence, dedicated teams, responsible for magnets maintenance and faults prevention, were formed. Members of those teams are experts in the fields of superconducting materials, cryogenic and many others and they have created models that allow to control magnets operation. Those models were hand-crafted and created from scratch and their development and adaptation is a time-consuming task, as well as requiring involvement of many people.
In this paper we attempt to automate the task of determining parameters of safe superconducting magnets’ operation or at least reduce the necessary experts involvement. It should be noted that specialists cannot be removed form the process of model creation, however their work can be made easier by automating the model itself. Consequently, we try to use Recurrent Neural Networks (RNNs), such as Long Short-Term Memory (LSTM)and Gated Recurrent Unit (GRU), to model electromagnets behavior.
The rest of the paper is organized as follows. Sections 2 and 3 provide background work about LHCand RNNs, respectively. System operation layer is presented in Section 4, with proposed method described in Section 5. Section 6 provides the results of the experiments. Finally, the conclusions of our research are presented in Section 7.
2 Large Hadron Collider
The main objective of physics experiments carried out at the LHC is to confirm the theory known as Standard Model (SM). Despite the SMbeing the best description of the whole physical reality, it is not a Theory Of Everything (TOE). Therefore, the LHCexperiments expect to reach beyond the SMand find a new physics. This would help to discriminate between many existing theories and to answer several tough questions about the Universe.
In 2000, the Large Electron-Positron Collider (LEP)was disasembled to make it possible to start the construction of the LHC, the largest accelerator at CERNLHCDesRep (). It accelerates two proton beams traveling in opposite directions. Therefore, the LHCmay be considered to be actually two accelerators in one assembly. Before the beams are injected into the LHC, proton bunches are prepared by the Proton Synchrotron (PS)and Super Proton Synchrotron (SPS)accelerators, which were constructed and used at CERNSPSdesignReport () before the LHCwas built. In each beam, there is a very large number of particles, which increases the probability of observing interesting collisions. A single bunch contains nominally protons. The operation of gradual delivery of proton bunches to the LHCis denoted as “filling the machine”. It takes bunches altogether to fill up the LHC. The time between bunches is .
Upon completion of a full acceleration cycle, a velocity of the protons approximately deviates from the speed of light by about one millionth. It is hard to consider a value of proton velocity and therefore kinetic properties of a single proton are described by its total energy, which reaches just before collision. The particles circle the long beam pipe times per second. Particle tracks are formed by superconducting magnets working at a temperature of superfluid helium at about . Each of eight sectors of the LHC comprises about magnets. The magnets produce a magnetic field appropriate to bend proton trajectory when they conduct an electrical current at the level of . In order to reach and maintain such extreme working parameters, the machine employs a dedicated helium cryogenic installation.
The LHChas four interaction points where bunches collide. The bunches collide every , so there are million of collisions every second. When two bunches collide, a number of individual proton-proton collisions occurs. Typically, there are about events in one bunch-bunch collision. Therefore, detectors are capable of capturing million of events every second. Around interaction points huge detection systems were built in order to record a complete image of all events during each collision. These systems are called “detectors” or “experiments”. They consists of coaxial cylinders placed around the beam pipe. Each of the cylinders has three major layers serving different purposes. The innermost cylinder is a tracking system called Inner Detector. Its purpose is to record trajectories of collisions’ products. The middle layer, called Calorimeter, measures a total energy of products. The outermost layer is a Muon Spectrometer, which allows to identify and measure the momenta of muons. Each of those three detector subsystems consists of several layers build with different technologies. The whole system is immersed in a high magnetic field parallel to the beam axis. The magnetic field bends trajectories of electrically-charged particles produced in collision. This magnetic field is generated by a system of huge superconducting magnets installed between layers of sensors.
The LHChas four huge detectors: two large and versatile – A Toroidal LHC ApparatuS (ATLAS)and Compact Muon Solenoid (CMS), and two smaller, more specialized – A Large Ion Collider Experiment (ALICE)and Large Hadron Collider beauty experiment (LHCb). In order to give an impression of what a particle physics experiment is, we focus on the ATLAS, especially ATLASInner Detector ATLAS2008 ().
The ATLASdetector is long along the beam pipe, and the radius is perpendicular to the beam pipe. The weight of the ATLASis about . The Inner Detector of the ATLASconsists of three subsystems: Pixel Detector, SemiConductor Tracker (SCT), and Transition Radiation Tracker (TRT). The SCTis a silicon microstrip tracker which comprises of double-sided modules. Each module consists of particle sensors. The sensor is a silicon die with strips. One strip is a p-n diode with length. The pitch between strips is . The sensor die is bonded to a front-end electronic chip made as an ASIC. In total there are = million of electronic channels. The modules are installed on cylindrical barrel layers and planar endcap discs. In total there are about of silicon surface.
Those devices utilize high-end electronic solutions to be able to acquire as much as possible of each event occurring within the accelerator. Most of the detector components were customarily designed to meet very rigorous parameters such as very low acquisition latency and very high immunity to radiation damages. The components of the detectors which capture the signals and decide which data to pass on for the analysis in the data center nieke2015analysis () need to be very fast and are implemented in Application-Specific Integrated Circuits (ASICs)and Field-Programmable Gate Arrays (FPGAs)angelucci2014FPGA (). Consequently, the design process is very tedious, costly and challenging. The components of the detectors which are responsible for selecting the right data are denoted as triggers and they serve a special role in distinguishing a valuable data. The trigger system of the ATLASdetector is organized in three levels of fast introductory analysis. First trigger level selects about thousand bunch crossings out of million of collisions every second. The decision is undertaken during . Already at this level a bunch crossing is divided into several Regions-of-Interest (RoI), i.e. the geographical coordinates, of those regions within the detector where its selection process has identified interesting features. Second trigger level reduces the rate to approximately thousand events per second, with an event processing time of about . Third trigger level is an event filter, which reduces the event rate to roughly per second. This final stage is implemented using offline analysis within an event processing time of the order of . The output data is passed on to a data storage. The recorded data is investigated around the world by means of using LHCComputing Grid (LCG).
3 Recurrent Neural Networks
Virtually all real world phenomena may be characterized by its spacial and temporal components. The spacial ones exist in space and it is assumed that they are stationary i.e. do not develop in time. Whereas the temporal ones unfold in time and have no spacial component. This is an idealization since there are neither pure spatial nor temporal phenomena, most of them may be described as a mixture of those two different components.
There is a well-established practice in Deep Learning applications to use Feed-forward Neural Networks (FNNs)and Convolutional Neural Networks (CNNs)to address tasks dominated by a spacial component krizhevsky (). On a contrary, data which contain more temporally distributed information are usually processed by models built around RNNs. Of course, it is possible to treat time series signals as a vector of spatial values and use FNNor CNNto classify them or do some regression LeCun_deep_learning_2015 ().
The voltage and current time series, which are used to train models described in this paper and make predictions, unfold in time and their temporal component is dominant. Therefore, we have decided to use RNNsand employ their most efficient architectures, namely LSTMand GRUgraves2012supervised (); morton2016analysis (); pouladi2015recurrent (); chen2016efficient ().
The Long Short-Term Memory (LSTM)internal structure is based on a set of connected cells. The structure of a single cell is presented in Fig. 0(a). It contains feedback connection storing the temporal state of the cell, three gates and two nodes which serve as an interface for information propagation within the network zachary2015critical ().
There are three different gates in each LSTM cell:
input gate which controls input activations into the memory element,
output gate controls cell outflow of activations into the rest of the network,
forget gate scales the internal state of the cell before summing it with the input through the self-recurrent connection of the cell. This enables gradual forgetting in the cell memory.
In addition, the LSTM cell also comprises of an input node and an internal state node .
The output of a set of LSTMcells is calculated according to the following set of vector equations:
While examining (1) – (6), it may be noted that instances for a current and previous time step are used in the calculation of the output vector of hidden layer as well as for the internal state vector . Consequently, denotes a value of an output vector at the current time step, where as refers to the previous step. It is also worth noting that the equations contain vector notation which means that they address the whole set of LSTMcells. In order to address a single cell a subscript is used as it is presented in Fig. 0(a), where for instance refers to a scalar value of an output of this particular cell.
The LSTMnetwork learns when to let an activation into the internal states of its cells and when to let an activation of the outputs. This is a gating mechanism and all the gates are considered as separate components of the LSTMcell with their own learning capability. This means that the cells adapt during training process to preserve a proper information flow throughout the network as separate units. This means that when the gates are closed, the internal cell state is not affected. In order to make this possible a hard sigmoid function was used, which can output and as given by (7). This means that the gates can be fully opened or fully closed.
Since its invention in 1997, the LSTMwas updated and modified greff2015lstm () to improve its modeling properties and reduce large computational demands of the algorithm. It is worth noting that LSTM, as opposed to a vanilla RNNwielgosz2016usingLSTM () is much more complex in terms of the internal component constituting its cell. This results in a long training time of the algorithm. Therefore there were many experiments conducted with simpler architectures which preserve beneficial properties of LSTM. One of such algorithms is the Gated Recurrent Unit (GRU)chung2015gated () which is widely used in Deep Learning as an alternative for LSTM. According to the recent research results it even surpasses LSTMin many applications chung2014empirical ().
GRUhas gating components which modulate the flow of information within the unit as presented in Fig. 0(b).
The activation of the model at a given time is a linear interpolation between the activation from the previous time step and the candidate activation . The activation is strongly modulated by as given by (9) and (10).
Formula for the update gate is given by (11) and modulates a degree to which a GRU unit updates its activation. The GRU has no mechanism to control to what extent its state is exposed, but it exposes the whole state each time.
The response of the reset gate is computed according to the same principle as the update gate. Previous state information is multiplied by the coefficients matrix and so is the input data. It is computed by (12).
The candidate activation is computed according to (13). When is close to 0, meaning that the gate is almost off, the stored state is forgotten. The input data is read instead.
4 Operation layer
This section briefly discuss an architecture of a system protecting LHCagainst equipment failures with special emphasis to software system dedicated to collection and analysis of data recorded at a time of failure. A set of data extracted from the data acquired within LHCprotection system was used as a learning dataset for experiments described in 6.1.
4.1 The LHC Machine Protection System
The LHCis an experimental device composed of hundreds of modules which constitute a large system. The tunnel and the accelerator is just a very critical tiny fraction of the LHCinfrastructure. The energy stored in the superconducting circuit of main magnets of each sector of the LHCat amounts to about , sufficient to heat up and melt of copper. At each proton beam accumulates an energy of , equivalent to the energy for warming up and melting of copper. It is a hundred times higher than previously achieved in any accelerator. Therefore the machine must be protected against consequences of malfunction of almost each its element. An energy corresponding to a fraction of some of the beam energy can quench a dipole magnet when operates at full current. The critical safety levels are therefore required to operate the LHC. A system dedicated to fulfill this important role is known as Machine Protection System (MPS)MPS_Wenninger (); interlocks (); MPS_Schmidt (). In general it consists of two interlock systems: the Power Interlock System (PIS)and the Beam Interlock System (BIS). The BISis a superordinate system which collects signals from many sources. There are currently inputs from client systems. We can distiguished several sources:
the Beam Loss Monitor (BLM);
the Beam Position Monitor (BPM);
the Warm magnets Interlock Controller (WIC);
the Fast Magnet Current change Monitor (FMCM);
the collimation system;
the personnel access system;
the operator inhibit buttons;
the vacuum valves;
the interlock signals from the experiments.
However the most important and the most complex protection subsystem is the PISwhich ensures communication between systems involved in the powering of the LHCsuperconducting magnets. This includes the Power Converters (PC), the Quench Protection System (QPS), the Uninterruptible Power Supplies (UPS), the emergency stop of electrical supplies (AUG)and the cryogenic system. When a magnet quench is detected by the QPS, the power converter is turned off immediately. In total, there are order of thousands of interlock signals. The signals are distributed mainly by three different arrangements:
point to point connections with one source and one receiver;
field-bus is used to create a software-based link in less critical cases, in particular to give permission for powering etc.;
current loops which are used to connect many sources to several receivers.
A current loop is a current source with a large compliance which force a constant current through a line connecting reed relays or solid-state switches (opto-couplers) installed in each module along whole LHCsector. A request for termination of the operation of the whole machine is triggered by opening one switch in the line. The interruption of the current generates a trigger signal of the interlock controller.
When a failure is detected that risks stopping the powering of magnets, a beam dump request is sent to the BIS. It generates three signals. A first is sent to the LHC Beam Dumping System (LBDS)to request the extraction of the beams. A second signal is sent to the injection system to block injection into the LHCas well as extraction of beam from the SPS. A third signal is a trigger for the timing system that sends out a request to many LHCsystems for providing data that were recorded before the beam dump, to understand the reasons for the beam dump. A device in these kind of systems comprises a circular buffer which at any time contains current information about the protected component. In particular case of a quench detector, the buffer contains voltage time series acquired with a high resolution time by an ADC connected to a superconducting coil. At a trigger time the half of the buffer space is already filled with samples acquired before an event (quench) time. After an event time the voltage samples are still recorded to fill the rest of the buffer space. Therefore the buffer contains time series around trigger time at both sides. This kind of data is called “post-mortem” because it is recorded after the component ceased its regular activity due to a malfunction.
The contents of the buffer is sent out by the network controller of the device over the field-bus to a gateway. Next the data is transfered to a database over Ethernet network. The transmition’s path of the data is shown on Fig. 2. There are two arrival points for data. Both are huge software systems to store and process data about any LHCmodule. First system is used during failures and requested checks. It is described below in 4.2. The Fig. 2 includes also a second system for permanent acquiring of equipment data. The CERN Accelerator Logging Service (CALS)is used to store and retrieve billions of data records per day, from across the complete CERN accelerator complex, related subsystems, and experiments. It is not a subject of this description.
4.2 The LHC Post Mortem System
The Post Mortem System (PM System)is a diagnostics tool with the role of organizing the collection and analysis of transient data recorded during time interval around a failure or a request sent by any device in the MPSCiapala:691828 (). The main purpose is to provide a fast and reliable tool for the equipment experts and the operation crews to help them decide whether accelerator operation can continue safely or whether an intervention is required. The most important parameters from LHCsystems are stored in circular buffers inside the individual devices. The aim is to process the contents of the buffers after an event i.e. post mortem. When a failure (a beam loss or a magnet quench) happens, a trigger is generated by the BIS. The buffers are then frozen and transmitted to the PM Systemfor further storage and analysis Ciapala:691828 (); Lauckner:567214 (); Borland:1998 (). The transmission is undertaken by the controllers of the equipment that send the data at the arrival of a trigger. The hardware path of signals stored in PM Systemis presented in the Fig. 2.
When implementing such a system, a number of challenges to overcome arises. The devices are distributed over the entire ring and therefore a correct synchronization and a precise time-stamping at the equipment level is necessary to reconstruct the event development. The value of parameters like buffer depth and sampling rate must be considered for each kind of devices separately. The solution of the PM Systemwas modified and developed during hardware commissioning, first experimental run and First Long Shutdown (LS1). The current architecture is presented in the Fig. 3. It provides a scalability both in vertical and in horizontal directions. The vertical scalability means that resources can be added to the nodes with a minimum downtime and impact on the service availability. The horizontal scalability is provided using three features. The first feature is a dynamic load distribution during data collecting. Any device can dump the Post Mortem (PM)data to any Data Collector transparently and without any additional configuration effort. This way the load can be distributed among the Data Collectors. The second feature is a data storage redundancy. The Data Collector that processes the dump writes the data to the distributed storage. The data are automatically replicated by the storage infrastructure. Third feature is a data consistency check. The storage infrastructure provides also an integrity verification and a detection and correction of errors.
A method of serialization of PM data has to ensure:
data splittabilty because a user usually runs an analysis only on a part of data dump,
data compression because the signals often contain zeros and an optimization of the occupied space in the storage system is desired.
The data transfer from devices relies entirely on the CERN Controls Middleware (CMW)Remote Device Access (RDA)protocol CMW (); CMWnew (). The main goal of CMWis to unify a middle layer used to build every control system for operarion of accelerators at CERN. Currently data collection uses RDA2 based on CORBA (old) and RDA3 based on ZeroMQ (new).
Users can access to the PMdata by means of using a specially designed Application Programming Interface (API). This APIwas designed using software architecture called Representational State Transfer (REST). The aim is to serve multiple language technologies according to user preferences: Python, MATLAB, LabVIEW, C++ and Java. A user is not dependent on the data format and the file system. A direct extraction of only one signal from a big dataset is possible without necessity of reading the entire set. The APIcan handle very complex queries.
4.3 The LHC Post Mortem Analysis Framework
In the Fig. 4 building blocks of the PM Systemthat is surrounded by the sequencer and the databases with two main parts, the server and the client can be seen.
In case of a Hardware Commissioning the sequencer application controls the power converters. They execute a current cycle or a ramp. The Post Mortem Request Handler combines the PMdata with the test performed by the sequencer and the Post Mortem Event Analyser collects all such events for the presentation and subsequent analysis to the equipment experts. The Analyser allows the experts to execute different analysis programs and data viewers. With them they can verify the success of the test and use an electronic signature to pass or fail it. The final result is being sent to the sequencer for upload into the Magnet Test Folder (MTF)database. There, a decision is made either to accept the test, to repeat the test or to open a procedure for non-conformity. Different analysis programs and data viewers (service) have been developed on the PM System.
The PMservice has been providing data collection, storage and analysis of LHCevent data since 2008. Around different client systems are today sending data to the PMservers, in case of beam dumps as much as individual files (containing up to of data) in a period of less than a few seconds Andreassen:1235888 ().
Analysis of these transient data requires an efficient and accurate analysis for the thousands of PMdata buffers arriving at the system in the case of beam dumps. The LHCPost Mortem Framework orchestrates the analysis flow and provides all necessary infrastructure to the analysis modules such as read/write APIs for PMdata, database/reference access, analysis configurations, etc. Fig. 5 presents main parts of Post Mortem Framework (PM Framework).
The key component of the Post Mortem Analysis Framework (PMA Framework)is an Event Builder. This application detects interesting sets of PMdata which subsequently become the subject of a detailed analysis by different Analysis Modules. Modules are prepared taking into account a domain knowledge related to specific class of equipment.
4.4 Anomaly detection and Post Mortem
In our research, we used PMJSONAPIwritten in Python to gather targeted data for an anomaly detection/prediction. Customized preprocessors were developed to access the framework in order to generate learning dataset. A Deep Learning model was build with the Keras/Theano libraries chollet2015 (), where we use an LSTM/GRUmodel as described in the subsection 6.1. Fig. 6 presents a schema for the experiment with its key components.
The presentation of the results of the model are intended to be integrated in a web application for quench detection. For this purpose ELectrical Quality Assurance (ELQA)framework, developed at TE-MPE-EE, will be used articleELQA (); mertikdhalerup (). ELQAframework is a framework for developement of interactive web applications for data analysis. It supports integration of various generated machine learning models with graphical user interfaces within a browser in an efficient way. It is developed in Python with opensource libraries such as Scikit-learn for machine learning and Bokeh, an interactive visualization library that targets modern web browsers for presentation Bokeh2016 ().
5 Proposed method for anomaly detection
In wielgosz2016usingLSTM (), the experiments with the Timber database (Timber is the user interface to the LHC Logging System) were conducted using the setup presented in Fig. 6(a), which employed RMSEmeasure for anomaly detection. A huge challenge in this approach is a lack of a clear reference threshold of an anomaly. In order to determine the error level, a group of experts must be consulted and it is not always easy to set one. This is due the fact that RMSEdoes not always indicate anomalous behavior well enough to quantify it correctly strecht2015comperative ().
We decided to take advantage of the experience from wielgosz2016usingLSTM () and introduced a new experimental setup which is shown in Fig. 6(b). This new approach allowed to convert a regression task to the classification one, which in turn enables better anomaly quantification.
The main difference between the previously used approach and the proposed one is an introduction of a grid quantization and classification steps (see marked boxes in both Fig. 6(a) and 6(b)). Consequently, in the new approach the train and test data are brought to several categories depending on a grid size. This transformation may be perceived as a specific kind of quantization, since the floating-point data are converted to the fixed-point representation denoted as categories in this particular setup. It is worth noting that increase in the grid size leads to an increase of the resolution and it is more challenging for the classifier. Potentially, large resolution setup will demand larger model.
Introduction of the grid quantization guaranties maximum error rate within each category. For instance, if the grid size is , the guaranteed maximum error is according to the accuracy quality measure. Once the grid size is increased to , the guaranteed maximum error is . In order to determine if an anomaly occurs it is enough to observe the error level for several time steps. When it turns out that over this time period the error exceeds for the grid size of , it means that the anomaly occurred (Fig. 8). The data expert has a much easier task in this case, because the only decisions required are about the grid size and the anomaly detection window, both of which are well quantifiable parameters.
It is worth emphasizing that the proposed approach is based on an assumption that a very well trained model is used. Its performance should be in a range between and in terms of accuracy. This is a foundation of choosing a reliable anomaly detection window.
The anomaly detection window is a parameter that determines how many consecutive predicted values in the signal need to differ from the true ones in order to detect an anomaly. Each predicted value that matches a true one resets the difference counter. A small anomaly detection window allows for a faster reaction time, while bigger one decreases the possibility of a false positive.
The anomaly detection window size is related to the look_ahead parameter of the model (how many time steps into the future model predicts) ie. look_ahead value must be bigger than the window size. Such a condition is necessary in order to avoid the influence a possible anomaly could have on values predicted within the window.
6 Experiments and the discussion
A main goal of the conducted experiments was validation of the feasibility of the application of the proposed method for detecting anomalies in PMtime series of LHCsuperconducting magnets. It is worth noting that this approach may also be adopted to other applications of the similar profile.
All the data used for the experiments were collected from CERNPMdatabase. The database contains various kinds of data acquired during both regular and special operating periods of LHC. Whenever something extraordinary, like a quench, happens, the data is being acquired and collected in the database. Additionally, twice a day data is acquired during ramp-up and ramp-down phase. We have collected signals from magnets current for different time series: , , and .
A procedure of data extraction from the PMdatabase is composed of several steps as presented in Fig. 9. A dedicated application and a set of parameters such as signal name and a number of time steps was used. PMdatabase API does not allow to acquire more than one-day long signal at once. Therefore, the scripts were designed to concatenate several separate days to form a single data stream used for the experiments.
In total of data was collected from the database. Only a fraction of the data contained valuable information for our experiment. Consequently, we have provided a script to extract this information end keep it in separate files. Then we have divided them into three different data sets: the small, the medium and the large one (Tab. 1). Such a division allowed to adjust hyper-parameters of the model with the small dataset before using the two remaining ones. As final steps, the data from each dataset was normalized to range and split into train and test sets.
It is worth noting that most of the experiments presented in the experiment section of the paper were done with the smallest dataset because computation time was more feasible. A few experiments were conducted with the largest dataset to examine improvement of the model performance as a consequence of using more data.
6.2 Quality assessment measure
Accuracy is used as a quality evaluation of the experiments results presented in this paper. Is is calculated as follows:
where and are the true categories and ones predicted by the model, respectively. The mean accuracy rate is calculated across all the predictions within a dataset of cardinality.
This section contains all the results of the experiments conducted to validate the feasibility of the application of the presented method. The learning process of the model consists of a series of steps, during which suitable parameters for obtaining the highest accuracy are selected. Fig. 9(a) – 9(c) present three examples of the results for different values of the hyper-parameters: number of cells and number of epochs.
Fig. 9(a) shows a virtually untrained LSTMneural network and the results of its operation. The blue section highlights performance of the network on the training set, the red color denotes prediction results for the training set, and purple prediction results of the test set. RMSEis almost , which means a very large prediction error.
Fig. 9(b) presents the results of the model for a network of cells. Increasing the number of cells allowed for a much better results of RMSE(). It should be noted, however, that due to choosing only two epochs, the model did not manage to achieve their best performance and learn fully.
Increasing the amount of the epochs from two to six significantly improved the results of the model, which is reflected by the Fig. 9(c). Consequently, RMSEdropped to , which means much better performance of the model.
Next series of experiments was conducted for different values of grid size (g), look_ahead steps (la), look_back steps (lb) as well as the number of cells (c) in the LSTMmodel. Batch size was fixed at , with number of epochs being equal to . The results of the experiments are presented in Tab. 2, and Fig. 11 – 13.
Fig. 11 shows the values of accuracy for different grid sizes for various other parameters combinations. Analyzing the figure one can see that increasing the size of grid (reducing the single quantum size) leads to a deterioration of a model performance for the same parameters and the same set of data. This is the expected effect, which results from an increase in the number of categories that must be taken into consideration in the classification process while maintaining the existing network resources.
Fig. 12 focuses on the presentation of the results of the model depending on the value of the look_ahead parameter. As expected, the more steps forward are anticipated, the lower accuracy is reached, because it is more challenging for the model to predict the correct categories. This effect even deepens with increase in the grid resolution and the network size reduction – smaller net can not handle correct classification with not enough resources available. Since the look_ahead parameter limits the anomaly detection window size, its value should be chosen carefully to allow for the best possible model accuracy while permitting a sufficiently large window size.
Fig. 13 focuses on the presentation of the LSTMmodel performance for a different number of cells. It is worth noting that without enough cells, and in particular using only one cell, the model is not able to accumulate all the training data dependencies needed to make the appropriate classification. It should also be noted that it is not necessary to use many more cells. In this case, using more than nine cells leads to very low improvement in the model performance. This observation leads to the conclusion that nine cells seem to be sufficient for this classification task.
It should be emphasized that the proposed method introduces a clear way to determine whether a given set of model hyper-parameters is adequate for the task (achieves required accuracy for given predetermined grid and window sizes), while giving an opportunity to simplify the architecture as much as possible. This is critical due to the fact that the size of the network significantly affect the computational complexity of training and prediction. It is of great importance also in the case of hardware implementation of LSTMand the GRUnetworks, because of its size directly determines the amount of hardware resources to be used.
We also conducted experiment with the largest dataset of 302 MB and the following architecture of the network: single layer LSTM, 128 cells, 20 epochs of training, look_back=16, look_ahead=128, grid=100 and optimizer Adam. This resulted in a huge performance leap comparing to the results presented in Tab. 2. The accuracy reached almost .
|Dataset (see Tab. 1)||small||small|
We did most of our experiments using LSTMalgorithm for a sake of congruency and consistency with wielgosz2016usingLSTM (), which this paper is meant to be a continuation of in many aspects. Nevertheless, we decided do show the comparison between GRUand LSTMperformance on a sample dataset as given in Tab. 3.
7 Conclusions and future work
This work extends existing experiments wielgosz2016usingLSTM () using higher resolution data and more diverse models. As LHCexperiments enter High Luminosity phase collision energies will be higher and more data will be collected what rises new challenges in maintenance of the equipment.
In experiments presented in this paper a signal was used. In the future experiments we plan on using several signals the same time and comparing performance with the one achieved in this paper. Nevertheless, a very promising results of accuracy were achieved for the largest dataset of 302 MB and the following architecture of the network: single layer LSTM, 128 cells, 20 epochs of training, look_back=16, look_ahead=128 and grid=100.
Another aspect worth investigating is feasibility of implementing predictive model on FPGAs. Performing computations on a PC works well for validation of the idea, but requirements of control systems like QPS are rather hard real-time which PC systems are incapable of doing.
- (1) O. Brüning, P. Collier, Building a behemoth, NATURE 448 (2007) 285–289. doi:10.1038/nature06077.
- (2) Incident at the LHC [online] (2008) [cited 16-01-2017].
- (3) L. Evans, P. Bryant, LHC Machine, Journal of Instrumentation 3 (08) (2008) S08001. doi:10.1088/1748-0221/3/08/S08001.
Report on the design study of a 300
GeV proton synchrotron, Tech. rep., CERN (1964).
- (5) ATLAS collaboration, The ATLAS Experiment at the CERN Large Hadron Collider, JINST 3.
- (6) C. Nieke, M. Lassnig, L. Menichetti, E. Motesnitsalis, D. Duellmann, Analysis of CERN computing infrastructure and monitoring data, Journal of Physics: Conference Series 664 (5) (2015) 052029. doi:10.1088/1742-6596/664/5/052029.
- (7) B. Angelucci, R. Fantechi, G. Lamanna, E. Pedreschi, R. Piandani, J. Pinzino, M. Sozzi, F. Spinella, S. Venditti, The FPGA based Trigger and Data Acquisition system for the CERN NA62 experiment, Journal of Instrumentation 9 (01) (2014) C01055. doi:10.1088/1748-0221/9/01/C01055.
A. Krizhevsky, I. Sutskever, G. E. Hinton,
Classification with Deep Convolutional Neural Networks, in: F. Pereira,
C. J. C. Burges, L. Bottou, K. Q. Weinberger (Eds.), Advances in Neural
Information Processing Systems 25, Curran Associates, Inc., 2012, pp.
- (9) Y. LeCun, Deep Learning of Convolutional Networks, in: 2015 IEEE Hot Chips 27 Symposium (HCS), 2015, pp. 1–95. doi:10.1109/HOTCHIPS.2015.7477328.
- (10) A. Graves, Neural Networks, Springer Berlin Heidelberg, 2012. doi:10.1007/978-3-642-24797-2.
- (11) J. Morton, T. A. Wheeler, M. J. Kochenderfer, Analysis of Recurrent Neural Networks for Probabilistic Modelling of Driver Behaviour, IEEE Transactions on Intelligent Transportation Systems PP (99) (2016) 1–10. doi:10.1109/TITS.2016.2603007.
- (12) F. Pouladi, H. Salehinejad, A. M. Gilani, Recurrent Neural Networks for Sequential Phenotype Prediction in Genomics, in: 2015 International Conference on Developments of E-Systems Engineering (DeSE), 2015, pp. 225–230. doi:10.1109/DeSE.2015.52.
- (13) X. Chen, X. Liu, Y. Wang, M. J. F. Gales, P. C. Woodland, Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (11) (2016) 2146–2157. doi:10.1109/TASLP.2016.2598304.
- (14) Z. C. Lipton, J. Berkowitz, C. Elkan, A Critical Review of Recurrent Neural Networks for Sequence Learning (2015). arXiv:1506.00019.
- (15) K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, J. Schmidhuber, LSTM: A Search Space Odyssey (2015). arXiv:1503.04069.
- (16) S. Hochreiter, J. Schmidhuber, Long Short-Term Memory, Neural Comput. 9 (8) (1997) 1735–1780. doi:10.1162/neco.19126.96.36.1995.
- (17) M. Wielgosz, A. Skoczeń, M. Mertik, Using LSTM recurrent neural networks for detecting anomalous behavior of LHC superconducting magnets (2016). arXiv:1611.06241.
- (18) J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Gated Feedback Recurrent Neural Networks (2015). arXiv:1502.02367.
- (19) J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (2014). arXiv:1412.3555.
J. Wenninger, Machine Protection
and Operation for LHC, CERN Yellow Report CERN-2016-002arXiv:arXiv:1608.03113v.
F. Bordry, R. Denz, K.-H. Mess, B. Puccio, F. Rodriguez-Mateos, R. Schmidt,
Protection for the LHC: Architecture of the Beam and Powering Interlock
System (LHC Project Report 521).
R. Schmidt, Machine Protection and
Interlock Systems for Circular Machines – Example for LHC, CERN Yellow
E. Ciapala, F. Rodríguez-Mateos, R. Schmidt, J. Wenninger,
The LHC Post-mortem System, Tech.
Rep. LHC-PROJECT-NOTE-303, CERN, Geneva (Oct 2002).
R. J. Lauckner, What data is needed
to understand failures during LHC operation.
M. Borland, A
Brief Introduction to the SDDS Toolkit, Tech. rep., Argonne National
Laboratory, USA (1998).
- (26) K. Kostro, J. Andersson, F. Di Maio, S. Jensen, N. Trofimov, The Controls Middleware (CMW) at CERN Status and Usage, Proceedings of ICALEPCS, Gyeongju, Korea.
- (27) A. Dworak, F. Ehm, P. Charrue, W. Sliwinski, The new CERN Controls Middleware, Journal of Physics Conference Series 396.
- (28) C.Aguilera-Padilla and S. Boychenko and M.Dragu and M.A. Galilee and J.C. Garnier and M. Koza and K. Krol and R. Orlandi and M.C. Poeschl and T.M. Ribeiro and M. Zerlauth, Smooth Migration of CERN POST MORTEM Service to a Horizontally Scalable Service, Proceedings of ICALEPCS2015, Melbourne, Australia.
O. O. Andreassen, V. Baggiolini, A. Castaneda, R. Gorbonosov, D. Khasbulatov,
H. Reymond, A. Rijllart, I. Romera Ramirez, N. Trofimov, M. Zerlauth,
The LHC Post Mortem Analysis
Framework, Tech. Rep. CERN-ATS-2010-009, CERN, Geneva (Jan 2010).
- (30) F. Chollet. keras [online] (2015).
- (31) L. Barnard, M. Mertik, Usability of visualization libraries for web browsers for use in scientific analysis, International Journal of Computer Applications 121 (1) (2015) 1–5. doi:10.5120/21501-4225.
- (32) M. Mertik, K. Dahlerup-Petersen, Data engineering for the electrical quality assurance of the lhc - a preliminary study, International Journal of Data Mining, Modelling and Management(in press).
- (33) Bokeh Development Team. Bokeh: Python library for interactive visualization [online] (2014) [cited 10.12.2016].
P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira, R. Abreu,
comparative study of regression and classification algorithms for modelling
students’ academic performance, in: Proceedings of the 8th International
Conference on Educational Data Mining, EDM 2015, Madrid, Spain, June 26-29,
2015, 2015, pp. 392–395.