Programa de Doctorado en Tecnologías de la Información y las Comunicaciones
NEW HYBRID NEUROEVOLUTIONARY ALGORITHMS FOR RENEWABLE ENERGY AND FACILITIES MANAGEMENT PROBLEMS
Tesis Doctoral presentada por
LAURA Mª CORNEJO BUENO
Director:
DR. SANCHO SALCEDO SANZ
Alcalá de Henares, 2018
No te rindas, por favor no cedas, aunque el frío queme, aunque el miedo muerda, aunque el sol se esconda y se calle el viento, aún hay fuego en tu alma, aún hay vida en tus sueños.
Mario Benedetti
Abstract
This Ph.D. thesis deals with the optimization of several renewable energy resources development as well as the improvement of facilities management in oceanic engineering and airports, using computational hybrid methods belonging to Artificial Intelligent (AI) to this end. These problems will be summarized hereafter with the technical solutions proposed at the end of the section.
Energy is essential to our society in order to ensure a good quality of life. Nowadays, fossil fuels are the most important energy source in the planet. However they contribute to Climate Change greatly, affecting the ecosystems severely. For this reason, there is a trend to gradually use renewable resources which guarantee a sustainable development. In fact, a penetration of these resources over the 50% are expected in fifty years. Obviously, that process shall not be the same in all countries due to renewable energy resources are not uniformly distributed throughout the World. It is important to note that different regions rely on different renewable technologies, so they can contribute to regional development in a larger or lesser degree. The main drawback of renewable energies is the natural variability inherent to the resource. This means that predictions over the characteristics on which renewable energies depend are necessary, in order to know the amount of energy that will be obtained at any time.
The second topic tackled in this thesis is related to the basic parameters that influence in different marine activities and airports, whose knowledge is necessary to develop a proper facilities management in these environments. For instance, the Significant Wave Height () is a basic parameter in wave characterization, important to different problems in marine activities such as the design and management of vessels, marine structures, Wave Energy Converters (WECs), etc. On the other hand, the lowvisibility events at airports, normally caused by fog events, is another fundamental issue in airport activities which can cause flight delays, diversions and cancellations or accidents in the worst cases.
Within this work, a study of the stateoftheart Machine Learning (ML) have been performed to solve the problems associated with the topics abovementioned, and several contributions have been proposed:

One of the pillars of this work is focused on the estimation of the most important parameters in the exploitation of renewable resources. For this purpose, SupportVector Regression (SVR), Neural Network (NN) ( MultiLayer Perceptron (MLP) and ExtremeLearning Machine (ELM)) and Gaussian process (GP) algorithms are used in several practical problems. The performance of these algorithms is discussed in every experiment carried out, and also the specific settings of the algorithms, as well as internal characteristics of the models.

The second contribution of this thesis is related to feature selection problems. More specifically, the use of EAs as Grouping Genetic Algorithm (GGA) or Coral Reef Optimization (CRO) hybridized with others ML approaches as classifiers and regressors. Regarding this, the GGA or CRO looks for several subsets of features important to solve the problem, and the regressor employed provides the prediction in terms of the features selected by the Genetic Algorithm (GA), reducing the computational cost with a good accuracy.
The proposed methodologies are applied to multiple problems: the prediction of , relevant for marine energy applications and marine activities, the estimation of Wind Power Ramps Events (WPREs), undesirable variations in the electric power produced by a wind farm, the prediction of global solar radiation in areas from Spain and Australia, really important in terms of solar energy, and the prediction of lowvisibility events at airports. All of these practical issues are developed with the consequent previous data analysis, normally, in terms of meteorological variables.
Resumen en Castellano
Esta tesis tiene como objetivo la optimización de la explotación de recursos energéticos renovables, así como la mejora en la gestión de instalaciones en ingeniería oceánica y aeropuertos, usando métodos computacionales híbridos pertenecientes a una rama de la Inteligencia Artificial (IA), denominada aprendizaje máquina, para este fin. Estos problemas serán resumidos a continuación con las soluciones técnicas propuestas al final de la sección.
La energía es esencial en nuestra sociedad para asegurar una buena calidad de vida. Hoy en día, los combustibles fósiles constituyen la fuente energética más importante del planeta, sin embargo, estas formas de energía contribuyen al Cambio Climático en gran medida, afectando los ecosistemas severamente. Por esta razón, se tiende gradualmente al uso de fuentes de energía renovables que garanticen un desarrollo sostenible. De hecho, se prevé en 50 años, una penetración de estos recursos por encima del 50%. Obviamente, este proceso no será igual en todos los países, debido a que las fuentes de energía renovables no están uniformemente distribuidas a lo largo del mundo. El hecho más importante es que cada área cuenta con alguna de ellas, y pueden contribuir al desarrollo regional en mayor o menor medida, gracias a lo cual las fuentes de energía convencionales serán sustituidas progresivamente. Sin embargo, se observa un lento desarrollo en este sentido, y la única cuestión que cabe preguntarse es cuándo las energías renovables tendrán mayor penetración en el sistema que los actuales combustibles fósiles. Para responder a esta pregunta, una buena manera es centrarse en el principal inconveniente de este tipo de energías: la variabilidad natural inherente al recurso. Esto significa que las predicciones sobre los parámetros más importantes de los que dependen las energías renovables son necesarias para conocer la cantidad de energía que será obtenida en un momento dado.
El otro tema abordado en esta tesis está relacionado con los parámetros que influyen en diferentes actividades marinas y aeropuertos, cuyo conocimiento de su comportamiento es necesario para desarrollar una correcta gestión de las instalaciones en estos entornos. Por ejemplo, la altura significativa de las olas () es un parámetro básico en la caracterización de las olas, muy importante para el desarrollo de actividades marinas como el diseño y mantenimiento de barcos, estructuras marinas, convertidores energéticos de ola, etc. Por otro lado, la escasa visibilidad en los aeropuertos, normalmente causada por la niebla, es otro aspecto fundamental para el correcto desarrollo de actividades aeroportuarias, y que puede causar retrasos en los vuelos, desvíos y cancelaciones, o accidentes en el peor de los casos.
En este trabajo se ha realizado un análisis del estado del arte de los modelos de aprendizaje máquina que se utilizan actualmente, con el objetivo de resolver los problemas asociados a los temas tratados con anterioridad. Diferentes contribuciones han sido propuestas:

Uno de los pilares esenciales de este trabajo está centrado en la estimación de los parámetros más importantes en la explotación de energías renovables. Con este propósito, los algoritmos Vectores Soporte para Regresión (VSR), Redes Neuronales (RN) (Perceptrones Multicapa (MLP) y Máquinas de Aprendizaje Extremo (MAE)) y Procesos Gaussianos son utilizados en diferentes problemas prácticos. El rendimiento de estos algoritmos es analizado en cada uno de los experimentos realizados, tanto la precisión de los mismos como la especificación de las características internas.

Otro de los aspectos tratados está relacionado con problemas de selección de características. Concretamente, con el uso de algoritmos evolutivos como Algoritmos de Agrupación Genética (AAG) o los algoritmos de Optimización de Arrecife de Coral (OAC) hibridizados con otros métodos de aprendizaje máquina como clasificadores y regresores. En este sentido, el AAG o OAC analizan diferentes conjuntos de características para obtener aquel que resuelva el problema con la mayor precisión, y el regresor empleado proporciona la predicción en función de las características obtenidas por el Algoritmo Genético (AG), reduciendo el coste computacional con gran fiabilidad en los resultados.
La metodología mencionada es aplicada a múltiples problemas: predicción de , relevante en aplicaciones energéticas y actividades marinas, estimación de eventos puntuales como son las rampas de viento (ERV), variaciones indeseables en la potencia eléctrica producidas por un parque eólico, predicción de la radiación solar global en áreas de España y Australia, realmente importante en términos de energía solar, y la estimación de eventos de baja visibilidad en aeropuertos. Los casos prácticos citados son desarrollados con el consecuente análisis previo de la base de datos empleada, normalmente, en términos de variables meteorológicas.
Agradecimientos
Seguramente necesitaría otro libro para expresar el enorme agradecimiento que siento hacia todos vosotros. El esfuerzo y espíritu de sacrificio siempre dan su fruto, pero una cosa es segura y es la importancia de poder contar con el apoyo incondicional de las personas que tienes alrededor. Porque de una forma u otra todos aportáis vuestro granito de arena y contribuís a que hoy pueda seguir creciendo como estudiante y como persona.
En primer lugar quiero agradecer a mi Director de Tesis, Sancho Salcedo la confianza depositada en mí. Por enseñarme a ganar seguridad en uno mismo desde la mejor de las humildades, y por supuesto por enseñarme tanto y tan constante, porque sin él está claro que todo este trabajo no habría visto la luz. Gracias por todo el apoyo y por hacer de esta etapa una de las mejores vividas hasta el momento.
También quiero agradecer a Enrique Alexandre, Silvia Jiménez, José Antonio Portilla, Lucas Cuadra, José Carlos Nieto y Raquel Criado el haberme acogido como una más, y permitirme aprender tanto de ellos. Porque además de poder trabajar en lo que te entusiasma, es un gusto poder hacerlo en un ambiente tan agradable como el que conseguís crear en el laboratorio.
Y a Carlos Casanova con el que he tenido el placer de poder trabajar codo con codo en uno de los artículos de esta tesis, y cuya colaboración ha sido crucial para su publicación. Aprovecho también para agredecer a todas las personas que me he cruzado en estos años y de las que he podido tanto aprender como coloborar en numerosos trabajos.
Por supuesto a mis “mindundis” Carlos Camacho, Freddy Pinto y Adrián Aybar, mis compis de fatiga. Gracias por las innumerables comidas, cafés, charlas y quedadas; por ese intercambio de conocimiento y sobre todo por las risas difíciles de olvidar. !‘Chicos ya se va viendo la luz al final del túnel! Y a mi compañera de la planta de arriba Inma Mohino, cuya sonrisa te alegra el día.
El doctorado además me ha permitido vivir una de las mejores experiencias de mi vivida. Mi estancia de 3 meses en Australia. Allí conté con el apoyo del Profesor Ravinesh C. Deo, quien me recibió con los brazos abiertos y contribyó en mi formación. Además aprendí que se puede conocer el verdadero significado de amistad aunque 2 personas estén separadas por más de 17.000 km. Kavina Dayal gracias por convertir esta estancia en algo inolvidable; nos vemos en alguna parte del mundo.
Quería agradecer también a una persona muy especial, a un amigo que me conoce desde mucho antes de estar aquí y que me ha apoyado tanto desde dentro. Enrique García, Kike, gracias por tus visitas, por la alegría que consigues despertarme aún en los momentos que parecían no tenerla. Ha sido muy importante poder contar tan de cerca con alguien de mi familia, alguien como tú.
Y ahora sí, las personas que me han visto crecer, y que tanto han creído en mi, incluso ni cuando yo misma creía.
Quiero empezar por la persona que me lo ha dado todo, mi madre, Carmen Bueno. No se puede explicar con palabras todo lo que te debo. Mil gracias por estar ahí al pie del cañón y sacar fuerzas de donde no las hay para mostrar siempre una sonrisa. En especial quiero destacar tu enorme valor y la fuerza que has demostrado siempre, sobre todo frente a la adversidad de este último año. Eres toda una inspiración y verte me hace sentir que puedo ser capaz de cualquier cosa. Eres mi luz.
A mi padre, Juan Carlos Cornejo, que no ha dejado de trabajar ni un solo día para que hoy haya podido llegar hasta aquí. Gracias por todo tu esfuerzo y voluntad, y por formar parte de lo que somos mi hermana y yo.
Mi pequeña hermanita, Sara Cornejo, que es muy grande. No creo que haya alguien que pueda conocerme mejor. Siempre estás pendiente de lo que necesito en cada momento, cuando la hermana mayor soy yo. Siempre sabes qué decir, y tus consejos nunca pueden ser más acertados. No podría imaginarme una vida sin tí, porque no habría una sin una de las partes. Gracias por tu condición humana y por hacer que no me sienta sola por muy lejos que estemos la una de la otra. Tu fuerza también hace que hoy pueda decir, !‘he llegado!, !‘estoy aquí!.
A David Doñoro, mi pilar, mi compañero de viaje en esta aventura. Son 8 años los que llevo a tu lado y consigues hacerme sentir como si estuviéramos empezando cada día. Es reconfortante poder llegar a “mi sitio” y sentirme en casa. Gracias por creer en mi, por ser fuerte cuando lo necesito, y por no dejarme caer. Juntos podemos con lo que nos echen.
Por supuesto a mi yayi, Teresa Montes, una luchadora innata, un ejemplo de vida. Una persona que es capaz de transmitir AMOR en el más profundo sentido de la palabra. Te debemos todo, y no creo que podamos estar más orgullosos de tener una madre, esposa, abuela y bisabuela como TÚ. Gracias por no decaer y seguir a nuestro lado con tanto tesón.
Y a mi otra abuela, Magdalena Macías, cuya pasión por los estudios nos animó siempre a seguir luchando por nuestro futuro. Gracias por ser igualmente una luchadora de esta vida, y mantenerte entera pese a todo lo vivido. Eres un ejemplo de constancia.
A mis tíos Francisco José López, Teresa Bueno e Isabel Bueno, gracias por hacer que pueda contar con vosotros y estar a mi lado en este camino. Destacar las comiditas de la tita Beli, que tanto ayudan cuando no hay tiempo ni de cocinar, los consejos de la tita Mari, y las provechas conversaciones del tito Francisco.
No puedo olvidarme de mis primos, José Gabriel del Prado, Eduardo González, Israel González y Jesús del Prado, que no son primos sino hermanos. Gracias por toda esta vida de cariño y diversión, sois parte esencial de este camino. Y como no Mariví, Ana y Andrea que se han convertido en las mejores primas inesperadas que se puede tener.
Y siguiendo la línea sucesoria es turno de mis sobrinitos. Isabel, Fátima, Gabriel y Alejandro, las personitas más pequeñas y que más pueden llenar de luz un día gris.
Como hay una que sí sabe leer, quería dedicarle unas palabras, pues creo que no puede imaginarse como me cambió la vida. Isabel eres mi motor, el empuje mañanero que me anima cada día. Ni loca me perdería esas conversaciones de niña de 7 años a adulto en las que a veces dudo de quién es el adulto. Porque aunque digas “eres la mejor tita del mundo”, y reconozco que me derrito cada vez que te escucho, eres tú la única capaz de hacer olvidar todo lo malo de alrededor, y encima lo haces sin darte cuenta. No he podido tener más suerte contigo, y solo quisiera poder transmitirte la mitad de lo que tú me das. No te rindas nunca y lucha, lucha porque yo siempre estaré a tu lado apoyándote como tú (siendo tan pequeña) has hecho conmigo. Recuerda, nada es imposible. Te quiero.
Y no puedo terminar esta parte sin agradecer enormemente a los que por desgracia no han podido verme acabar. Mi abuelo, José Luis Cayuela, y Manuel García. Me quedo con todo lo que me habéis enseñado, que es mucho, echándoos de menos cada día, pero agradeciendo enormemente el haber coincidido en esta vida, y que hayáis formado parte de mi familia. Padri nadie podrá llamarme “chata” de la forma en que tú lo hacías, y no es comprable a nada la forma burlona de llamar a “la Lauri” que tú Manuel tenías. Gracias por vuestro AMOR. Os quiero.
A mi otra familia, Cati, Pablo, Estefanía, Julián, Toñi, Jesús Ángel, Juan Carlos, Abuelos, Manolo y desde el más profundo cariño Grego. Por hacerme sentir parte de vuestras vidas y contribuir con vuestro cariño y valores desde que comencé esta etapa tan importante. Gracias por estar a mi lado y acogerme como lo hicísteis.
Y no podía olvidarme de vosotros, mis amigos y compañeros desde que empezamos la carrera. Casi 10 años ya y tan unidos como al principio. Casillas, Pascu, Dan, Jenny, Gallo, Guille, Sara, Samu, Paloma, Jesús, Manu, Mar, Pastor, Alvarito, Víctor, Jesica y Susana. !‘Gracias! Porque sabéis lo importante que sois, y habéis demostrado estar en todo momento. !‘Qué aburridos habrían sido los días sin vosotros! Espero que mantegamos esta bonita amistad por muchos años más.
Por último quiero dar las gracias a “las niñas del cole” Marta, Rosana y Lidia, con las que he compartido mi niñez y adolescencia y con las que sigo creciendo y afrontando etapas. Después de casi 20 años es íncreible poder contar con amigas como vosotras. Y a Rocío, siempre la vecinita. Por todas las tardes de estudio que nos ha amenizado con su alegría, y ser todo un apoyo por muchos días que pasen sin que nos veamos.
Siento si me dejo a alguien, pero esto es gracias a TODOS, a los que aparecéis y a los que no he puesto. Porque a lo largo de los años se conocen muchas personas que dejan huella y forman parte de lo que ahora somos. Quién sabe cuándo volveré a escribir un libro, al menos en éste puedo reflejar el esfuerzo de muchos años y el fruto obtenido, que también es vuestro.
A todos, OS QUIERO.
Contents
 I Motivation and stateoftheart
 II Proposed contributions with numerical results in renewable energy problems
 III Proposed contributions with numerical results in facilities management
 IV Final remarks and future research activities
 V Appendix
 VI Bibliography
List of Figures
 LIST OF ACRONYMS
 1.1 Structure of SC, including , EC and FC.
 1.2 Artificial neural network.
 1.3 Illustration of the SVR model. Samples in the original input space are first mapped to a Reproducing Kernel Hilbert Space, where a linear regression is performed. All samples outside a fixed tube of size are penalized, and are support vectors (doublecircled symbols).
 1.4 Outline of the grouping crossover implemented in the proposed example of GGA.
 2.1 An example of BO on a toy 1D noiseless problem.
 2.2 Western USA Buoys considered in this study. In red buoy where the prediction is carried out from data at blue ones.
 2.3 Average results obtained for the optimization after evaluating the performance of 50 different parameters for the BO technique and a random exploration of the parameter space. The performance a configuration specified by a human expert is also shown for comparison.
 2.4 Wave Height optimization average results of the performance of the 50 different parameter values selected by the BO technique and a random exploration of the parameter space. The plot also shows the performance of the parameter values selected by a human expert.
 2.5 prediction after the FS process with the GGAELM approach; (a) ELM; (b) SVR; (c) ELM with BO; (d) SVR with BO.
 2.6 Scatter plots in the problem of prediction in tackled by the ELM and SVR with FS by the GGAELM; (a) ELM; (b) SVR; (c) ELM with BO; (d) SVR with BO.
 2.7 prediction after the FS process with the GGAELM approach; (a) ELM; (b) SVR; (c) ELM with BO; (d) SVR with BO.
 2.8 Scatter plots in the problem of prediction in tackled by the ELM and SVR with FS by the GGAELM; (a) ELM; (b) SVR; (c) ELM with BO; (d) SVR with BO.
 3.1 Representation of the geographical location of the wind farms (labeled “A”, “B” and “C”) considered in the experimental work carried out in this thesis. The four closest nodes from the EraInterim reanalysis (predictive variables) have also been represented for illustrative purposes. The reason why these wind farms have been selected is that they cover different parts of Spain, north, center and south, characterized by different wind regimes.
 3.2 (a) Estimation of the ramp function (Equation (3.1)) obtained by using the proposed approach in the particular case in which the ML is an ELM regressor. This figure corresponds to Wind Farm A, whose location has been represented in Figure 3.1. (b,c) represent two shorter excerpts in which the predicted WPREs that exceed the thresholds ( or ) are shown to be correctly detected. A WPRE is detected if . The predicted series exhibits RMSE MW, MAE MW, (+ramp) , (ramp) and (no ramp) .
 3.3 Estimation of the ramp function (Equation (3.1)) obtained by this proposed hybrid approach when using the GP as the ML regressor in Wind Farm B. The predicted series exhibits RMSE MW, MAE MW, (+ramp) , (ramp) and (no ramp) (see Table 3.3).
 3.4 Prediction of the ramp function (Equation (3.1)) when using the GP in Wind Farm C. The predicted ramps series exhibits RMSE MW, MAE MW, (+ramp) , (ramp) , and (no ramp) (see Table 3.3).
 3.5 Estimation of the ramp function (Equation (3.2)) obtained by the proposed approach using the GP regressor, in Wind Farm A. The ramp predicted values resemble the ramp measured ones with RMSE MW and MAE MW, (ramp) and (no ramp) (see Table 3.4).
 3.6 Estimation of the ramp function (Expression (3.2)) obtained by the proposed method when using the ELM regressor, in Wind Farm B. The predicted series follows the measured series with RMSE MW and MAE MW, (ramp) and (no ramp) (see Table 3.4).
 3.7 Estimation of the ramp function (Expression (3.2)) obtained by the proposed method when using the ELM regressor, in Wind Farm C. The predicted series follows the measured series with RMSE MW and MAE MW, (ramp) and (no ramp) (see Table 3.4).
 4.1 Estimation of the image spectrum of a radar image time series. The plot corresponds to a transect in the spectral domain along the peak wave direction, , where denotes the peak wave wave number vector.
 4.2 Illustration of the SVR training and testing process.
 4.3 Scatter plots colored by of the measured by the buoy and estimated by the SM and the SVR approach for the test data set at FINO 1; (a) SM; (b) SVR approach. The solid line indicates the best fit.
 4.4 Scatter plots colored by of the measured by the buoy and estimated by the SM and the SVR approach for the test data set at Ekofisk; (a) SM; (b) SVR approach. The solid line indicates the best fit.
 4.5 Scatter plots colored by of the measured by the buoy and estimated by the SM and the SVR approach for the test data set at Glas Dowr; (a) SM; (b) SVR approach. The solid line indicates the best fit.
 4.6 Temporal evolution of the estimation obtained with the SVR in the different platforms considered; (a) Fino 1; (b) Ekofisk; (c) Glas Dowr.
 4.7 Difference between measured and predicted with the SVR and SM in the locations considered; (a) Fino 1; (b) Ekofisk; (c) Glas Dowr.
 4.8 Bivariate histograms of significant wave steepness vs. relative error in the estimation for each measuring station: Fino 1 (top), Ekofisk (middle), and Glas Dowr (down). The results derived from the SM are plotted on the left, and the corresponding results from SVR (predicted) are located on le right part of the image. The color bar indicates the percentage of total data within the histogram for each case.
 5.1 Location of Valladolid airport (Villanubla), Spain, where the experiments to validate the proposed methodology for the prediction of lowvisibility events have been carried out.
 5.2 Example of the proposed prediction system with wavelet preprocessing for the test dataset and training step; (a) example of the predictionsystem structure; (b) training phase and evaluation of the regressors.
 5.3 Decomposition of a signal of interest into approximation and detailed parts using a wavelet transform. In this case, the signals of interest are the predictive variables for the lowvisibility prediction.
 5.4 Skill score of the different regressors considered at different timehorizons for the prediction of low visibility; (a) with the CIBA tower; (b) without the CIBA tower.
 5.5 Prediction of lowvisibility events at Valladolid airport by the MLP approach with wavelet preprocessing for the test dataset; (a) direct prediction of the runway visual range (temporal); (b) Normalized scatter plot.
 5.6 Prediction of lowvisibility events at Valladolid airport by the GP approach with wavelet preprocessing for the test dataset; (a) direct prediction of the runway visual range (temporal); (b) Normalized scatter plot.
List of Tables
 2.1 Geographic coordinates and buoy’s description.
 2.2 Predictive variables used in the experiments.
 2.3 Comparative results of the estimation by the ELM and SVR approaches after the FS by the GGAELM in 2010.
 2.4 Comparative results of the estimation by the ELM and SVR approaches after the FS by the GGAELM in 2010.
 3.1 Predictive variables considered at each node from the ERAInterim reanalysis.
 3.2 Configuration and design parameters of the regression ML models explored in the proposed approach for all the wind farms considered.
 3.3 Results (in terms of RMSE, MAE and sensitivity) corresponding to the estimation of the ramp function (Equation (3.1)) obtained when using the proposed approach, as a function of the ML regressors explored (SVR, ELM, GP and MLP), in the tree study cases: the wind farms “A”, “B” and “C”, whose locations have been represented in Figure 3.1.
 3.4 Results (in terms of RMSE, MAE and sensitivity) corresponding to the estimation of the ramp function (Equation (3.2)) obtained by the proposed approach as a function of the ML regressors explored (SVR, ELM, GP, and MLP), for Wind Farms “A”, “B” and “C”, respectively.
 4.1 Division of the databases into different sets for the experiments.
 4.2 Comparative results of the estimation by the SM and the SVR approaches.
 4.3 SVR performance with different train/test partitions at Fino 1 measuring station.
 4.4 Averaged significant wave steepness derived from the buoy data at the dates when the measurements were obtained in the different platforms considered.
 5.1 Data used in the study.
 5.2 Comparison of the best results (10 runs of the algorithms in different sets) for the estimation of lowvisibility events (in terms of the runway visual range at the airport) by the ELM, SVR, MLP and GP, with and without wavelet preprocessing. The variance is given in parentheses.
 5.3 Comparison of the best results for the estimation of lowvisibility events (in terms of the runway visual range at the airport) by the ELM, SVR, MLP and GP, for the wavelet preprocessing case, and without CIBA features.
 5.4 Comparison of the best results for the estimation of lowvisibility events (in terms of the runway visual range at the airport) by the ELM, SVR, MLP and GP, for the wavelet preprocessing case, with nighttime and daytime samples.
 5.5 Confusion matrix of the MLP and GP models with the wavelet method for 1000m, 550m and 300m classification thresholds (Th).
List of Acronyms
 AI
 Artificial Intelligent
 ARMA
 AutoregressiveMovingAverage
 ANN
 Artificial Neural Network
 BO
 Bayesian Optimization
 CI
 Computational Intelligence
 CNN
 Convolutional Neural Network
 CRO
 Coral Reef Optimization
 DFT
 Discrete Fourier Transform
 EA
 Evolutionary Algorithm
 EC
 Evolutionary Computation
 ELM
 ExtremeLearning Machine
 EI
 Expected Improvement
 EV
 Electric Vehicles
 FC
 Fuzzy Computation
 FFT
 Fast Fourier Transform
 FL
 Fuzzy Logic
 FS
 Feature Selection
 GA
 Genetic Algorithm
 GGA
 Grouping Genetic Algorithm
 GP
 Gaussian process
 GS
 Grid Search
 MAE
 Mean Absolute Error
 ML
 Machine Learning
 MLP
 MultiLayer Perceptron
 MSE
 Mean Squared Error
 NC
 Neural Computation
 NN
 Neural Network
 RMSE
 Root Mean Squared Error
 SAR
 Synthetic Aperture Radar
 SC
 SoftComputing
 SM
 Standard Method
 SVM
 Support Vector Machine
 SVR
 SupportVector Regression
 Significant Wave Height
 V2G
 VehicletoGrid
 Wave Energy Flux
 WECs
 Wave Energy Converters
 WPF
 Wind Power Forecasting
 WPREs
 Wind Power Ramps Events
Part I Motivation and stateoftheart
Chapter 1 Introduction
1.1 Motivation
The challenges of renewable energies in the near future, as well as the associated facilities, will require new computational tools for the optimization and exploitation of the available resources. In a World where the climate change is a recognized fact, it is necessary a reevaluation of the energy use. For this reason, this work is focused on renewable energy sources, with almost zero emissions of both air pollutants and greenhouse gases. Currently, renewable energy sources supply 19% of the total world energy demand [REN212017], but it is expected an increasing in the near future. In consequence, a sustainable development must be guaranteed, defined by the World Commission on Environment and Development as “development that meets the needs of the present without compromising the ability of future generations to meet their own needs”. The main goal then is to conciliate energy production, that satisfies social welfare, and the environmental protection, achieving economic growth.
The technology is the best way to meet the objectives proposed. There are many renewable energy technologies but most of them are still at an early stage of development and not technically mature. The aim of this work is to contribute to the progress in this field by means of AI.
AI is a term that indicates in its broadest sense the ability of a machine to perform the human learning. Specifically, it is the part of the computer science tasked with the design of intelligent computer systems. This kind of intelligence can be associated with human behavior, i.e., understanding, language, learning [Kalogirou2006] and whose skills can be applied in diverse applications in forecasting, patter recognition, optimization and many more. That is possible because AI covers different areas like Neural Computation (NC), Evolutionary Computation (EC) and Fuzzy Computation (FC), among others, that can be used or hybridized to solve several problems in our society.
Some of these algorithms are used, in this work, in the estimation of really important parameters in renewable energy area, taking into account not only the attainment of energy but also how these parameters can affect in determined tasks of facilities management. In this regard, facilities management are related with the necessary infrastructures in renewable energy environments and another fields where meteorological variables affect in the same way, as the study developed in airports to estimate the visibility in runways which will be explained in depth in Part III.
Two fields compose the core of this Ph.D Thesis: ML regression algorithms and evolutionary optimization. Pattern recognition is a branch of AI focused on systems that are able to associate multidimensional data to labels. Using this method it will be possible to develop others systems based on the available data to obtain predictions and classifications in many fields in Engineering, Sciences, Economy, etc. The second pillar of this thesis is the use of the EC approaches to solve features selection problems and thus optimize the accuracy in the future regressions.
In the next sections it will be provided a more detailed description of the ML techniques applied in this work as well as Evolutionary Algorithm (EA), providing a review of the stateoftheart in SoftComputing (SC) techniques.
1.2 State of the art
This section presents a description of the stateoftheart in the technological fields addressed in this thesis. Figure 1.1 shows a scheme of different areas of SC. SC is an essential part of AI and many of its methods can be classified in the field of Knowledge called “Natural Computing”. The algorithms that can be found in this category are inspired by the way Nature solves complex problems. In this regard, EC is inspired in the theory of evolution or ANNs find their behaviour in human brain. Because of the variety of techniques used, the structure of this section has been chosen to properly cover the areas included in AI that come in handy in this work.
1.2.1 Neural Computationbased Approaches
NC is the part of SC that includes algorithms inspired on how the human brain learns. They have been mainly used in classification and regression problems. In the next points four of the most used NC approaches will be described: Feedforward NNs (MLPs and ELMs), GPs for Regression and SVR algorithms.
Multilayer perceptrons
A MLP is a particular kind of Artificial Neural Network (ANN) that is massively parallel. It is considered a distributed informationprocessing system, which has been successfully applied in modelling a large variety of nonlinear problems [Haykin1998, Bishop1995]. The MLP consists of an input layer, a number of hidden layers, and an output layer, all of which are basically composed of a number of special processing units called neurons, as Figure 1.2 shows. Just as important as the processing units themselves is their connectivity, whereby the neurons within a given layer are connected to those of other layers by means of weighted links, whose values are related to the ability of the MLP to learn and generalize from a sufficiently long number of examples. Such a learning process demands a proper database containing a variety of input examples or patterns with the corresponding known outputs. The adequate values of the weights minimize the error between the output generated by the MLP (when fed with input patterns in the database), and the corresponding expected output in the database. The number of neurons in the hidden layer is a parameter to be optimized when using this type of neural network [Haykin1998, Bishop1995].
The input data for the MLP consists of a number of samples arranged as input vectors, x=. Once a MLP has been properly trained, validated and tested using an input vector different from those contained in the database, it is able to generate a proper output . The relationship between the output and the input signals of a neuron is
(1.1) 
where is the output signal, for are the input signals, is the weight associated with the th input, and is a threshold [Haykin1998, Bishop1995]. The transfer function is usually considered as the logistic function,
(1.2) 
The process to obtain an accuracy output is related with the training procedure as it was mentioned before. During the training process, the error between the estimated output and its corresponding real value in the database will determine what degree the weights in the network should be adjusted, and thanks to all neurons in the network are interconnected (feedforward NN) MLP makes easy to get this purpose. Hence, The objective of training is to find the combination of weights which result in the smallest error. There are many algorithms that can be used to train a MLP. One possible technique is the backpropagation training algorithm which uses the procedure known as gradient descent to try to locate the global minimum of the error [Gardner1998]. Another approach is the wellknown LevenbergMarquardt algorithm which is often applied to train the MLP [Hagan1994].
Extreme Learning Machine
An ELM [Huang2015, Huang2006] is a novel and fast learning method based on the structure of MLPs, similar to the one shown in Figure 1.2. In addition, the ELM approach presents a novel way of training feedforward NN. The most significant characteristic of the ELM training is that it is carried out just by randomly setting the network weights, and then obtaining a pseudoinverse of the hiddenlayer output matrix. The advantages of this technique are its simplicity, which makes the training algorithm extremely fast, and also its outstanding performance when compared to avantgarde learning methods, usually better than other established approaches such as classical MLPs or SVRs.
Moreover, the universal approximation capability of the ELM network, as well as its classification capability, have been already proven [Huang2012].
The ELM algorithm can be summarized as follows: given a training set an activation function , which a sigmoidal function is usually used, and number of hidden nodes (),

Randomly assign inputs weights and bias , .

Calculate the hidden layer output matrix , defined as
(1.3) 
Calculate the output weight vector as
(1.4) where stands for the MoorePenrose inverse of matrix [Huang2006], and is the training output vector, .
Note that the number of hidden nodes () is a free parameter of the ELM training, and must be estimated for obtaining good results. Usually, scanning a range of values is the best solution.
Gaussian Processes for Regression
GPs for regression is a generic supervisedlearning method primarily designed for solving regression problems, the advantages of which include the predictive interpolation of the observations. Here, the prediction is probabilistic (Gaussian), so that one computes the empirical confidence intervals and exceeded probabilities to be used in refitting the prediction in some region of interest. Moreover, different linearregression and correlation models may be specified. Here a short description of the most important characteristics of the GP approach is given, for which the interested reader is referred to the more exhaustive reviews of [Lázaro2012] and [Rasmussen2006] for further information.
Given a set of dimensional inputs and their corresponding scalar outputs , for the dataset , the regression task obtains the predictive distribution for the corresponding observation based on , given a new input .
The GP model assumes that the observations can be modelled as some noiseless latent function of the inputs in addition to an independent noise, , and then specifies a zeromean GP for both the latent function and the noise , where is a covariance function, and is a hyperparameter that specifies the noise power.
The covariance function specifies the degree of coupling between and , and encodes the properties of the GP, such as the power level and smoothness. One of the bestknown covariance functions is the anisotropicsquared exponential, which has the form of an unnormalized Gaussian function, , and depends on the signal power and length scales , where is a diagonal matrix containing one length scale per input dimension. Each length scale controls the degree to which the correlation between outputs decay as the separation along the corresponding input dimension grows. All kernel parameters are collectively referred as .
The joint distribution of the available observations (collected in ) and some unknown outputs form a multivariate Gaussian distribution, with parameters specified by the covariance function
(1.5) 
where , and . Here, is used to denote the identity matrix of size . The notation refers to the entry at row , column of . Likewise, is used to reference the th element of vector .
From (1.5) and the conditioning on the observed training outputs, the predictive distribution is obtained as
(1.6) 
which is computed times, due to the inversion of the matrix .
The hyperparameters are typically selected by maximizing the marginal likelihood (also called “evidence”) of the observations, which is
(1.7) 
If analytical derivatives of (1.7) are available, optimization is carried out using gradient methods, with each gradient computed times. GPs regression algorithms can typically handle a few thousand data points on a desktop computer.
Support Vector Regression
SVR [Smola2004] is one of the stateoftheart algorithms for regression and function approximation. The SVR approach takes into account the error approximation to the data and also the generalization of the model, i.e. its capability to improve the prediction of the model when a new dataset is evaluated by it. Although there are several versions of the SVR, the classical model, SVR, described in detail in [Smola2004] and used in a large number of application in Science and Engineering [Salcedo2014b], is considered in this work.
The SVR method for regression consists of, given a set of training vectors , where stands for a vector of predictive variables, and is the target associated to the input, training a model of the form
(1.8) 
where stands for an estimation of , in such a way that a risk function is minimized. This risk function can be written as:
(1.9) 
where the norm of controls the smoothness of the model, is a function of projection of the input space to the feature space, is a parameter of bias, and is the loss function selected. In this thesis, the L1SVRr is used (L1 support vector regression), characterized by an insensitive loss function [Smola2004]:
(1.10) 
In order to train this model, it is necessary to solve the following optimization problem [Smola2004]:
(1.11) 
subject to
(1.12) 
(1.13) 
(1.14) 
Figure 1.3 shows and example of the final solution for a given input variables. The dual form of this optimization problem is usually obtained through the minimization of the Lagrange function, constructed from the objective function and the problem constraints. In this case, the dual form of the optimization problem is the following:
(1.15) 
subject to
(1.16) 
(1.17) 
In addition to these constraints, the KarushKuhnTucker conditions must be fulfilled, and also the bias variable, , must be obtained. The interested reader can consult [Smola2004] for reference. In the dual formulation of the problem the function is the kernel matrix, which is formed by the evaluation of a kernel function, equivalent to the dot product . A usual election for this kernel function is a Gaussian function, as follows:
(1.18) 
The final form of function depends on the Lagrange multipliers , as follows:
(1.19) 
So finally, the estimation of the target under study will be carried out using the following expression:
(1.20) 
In this way it is possible to obtain a SVR model by means of the training of a quadratic problem for a given hyperparameters , and . The estimation of these SVR hyperparameters is a process usually carried out before the training of the algorithm. There are different methods to obtain , and , but the most common approach consists of a Grid Search (GS) procedure. GS exhaustively considers all parameters combinations from a grid of possible predefined values. The quality of the SVR with these hyperparameters’ values is tested on a reduced validation set of data from the problem at hand. More information about GS and alternative techniques for SVR hyperparameters estimation can be found in [Smola2004]. A variant of the GS approach that includes lower and upper bounds to limit the tested values of the hyperparameters can be used. This method was proposed in [Ortiz2009], and it is able to considerably reduce the time for hyperparameters estimation with GS, without affecting the quality of the SVR.
1.2.2 Evolutionary Computationbased Algorithms
EC algorithms are used for solving continuous optimization challenges, working in discrete and search spaces. They are used also in features selection to improve the performance of the predictions in regression problems. All genetic and EAs are based on the evolution of the population of candidate solutions by applying a series of evolutionary operators. Part of this PhD. Thesis is based on the application of this kind of techniques, hence the explanation of different approaches will be carried out in the following points.
The Grouping Genetic Algorithm
There are many potential benefits for applying Feature Selection (FS) in prediction problems: facilitating data visualization and data understanding, reducing the measurement and storage requirements, reducing training and utilization times or defying the curse of dimensionality to improve prediction performance, to mention some of them. Previous literature on this issue mainly focus on constructing and selecting subsets of features that are useful to build a good predictor. This process has been tackled before with EC [Salcedo2002, Salcedo2014a]. The GGA is a class of evolutionary algorithm especially modified to tackle grouping problems, i.e., problems in which a number of items must be assigned to a set of predefined groups (subsets of features, in the case of this work). It was first proposed by Falkenauer [Falkenauer1992, Falkenauer1998], who realized that traditional GAs had difficulties when they were applied to grouping problems (mainly, the standard binary encoding increases the space search size in this kind of problem). The GGA has shown very good performance on different applications and problems [Agustín2008, DeLit2000]. In the GGA, the encoding, crossover and mutation operators of traditional GAs are modified to obtain a compact algorithm with very good performance in grouping problems.
Problem encoding
The GGA initially proposed by Falkenauer is a variablelength GA. The encoding is carried out by separating each individual in the algorithm into two parts: the first one is an assignment part that associates each item to a given group. The second one is a group part, that defines which groups must be taken into account for the individual. In problems where the number of groups is not previously defined, it is easy to see why this is a variablelength algorithm: the group part varies from one individual to another. In the implementation of the GGA for FS, an individual has the form . An example of an individual in the proposed GGA for a FS problem, with 20 features and 4 groups, is the following:
1 1 2 3 1 4 1 4 3 4 4 1 2 4 4 2 3 1 3 2 1 2 3 4
where the group 1 includes features , group 2 features , group 3 features {4,9,17,19} and finally group 4 includes features .
Genetic operators
Tournamentbased selection mechanism is usually used, similar to the one described in [Yao1999]. It has been shown to be one of the most effective selection operators, avoiding superindividuals and performing a excellent exploration of the search space. Regarding the crossover operator, it is implemented in the GGA as a modified version of the initially proposed by Falkenauer [Falkenauer1992, Falkenauer1998]. The process to apply this operator follows the process outlined in Figure 1.4:

Choose two parents from the current population, at random.

Randomly select two points for the crossover, from the “Groups” part of parent 1, then, all the groups between the two crosspoints are selected. In the example of Figure 1.4 the two crossover points are and . Note that, in this case the items of parent1 belonging to group and are 1, 2, 4, 5, and 6.

Insert the selected section of the “Groups” part into the second parent. After the insertion in the example of Figure 1.4, the assignment of the nodes 1, 2, 4, 5 and 6 of the offspring individual will be those of parent 1, while the rest of the nodes’ assignment are those of parent 2. The “Groups” part of the offspring individual is that of parent 2 plus the selected section of parent 1 (8 groups in total, in this case).

Modify the “Groups” part of the offspring individual with their corresponding number. In the example, = 1 2 3 4 5 6 1 2 is modified into = 1 2 3 4 5 6 7 8. Modify also the assignment part accordingly.

Remove any empty groups in the offspring individual. In the example considered, it is found that groups 1, 2, 3, and 6 are empty, these groups’ identification number are eliminated and the rest are rearranged. The final offspring is then obtained.
Regarding mutation operator, note that standard mutation usually calls for an alteration of a small percentage of randomly selected parts of the individuals. This type of mutation may be too disruptive in the case of a grouping problem. In this case, a swapping mutation in which two items are interchanged (swapping this way the assignment of features to different groups), is taken into account. This procedure is carried out with a very low probability to avoid increasing of the random search in the process.
The Coral Reef Optimization
The CRO is a novel metaheuristic approach for optimization, recently proposed in [Salcedo2014c], which is based on simulating the corals’ reproduction and coral reefs’ formation processes. Basically, the CRO is based on the artificial modeling of a coral reef , consisting of a grid. It is assumed that each square (i,j) of is able to allocate a coral . Note that each of such corals represents a solution to a given optimization problem, for which it is encoded as a string of numbers, spanning a given alphabet . The CRO algorithm is first initialized at random by assigning some squares in to be occupied by corals (i.e. solutions to the problem) and some other squares in the grid to be empty, i.e. holes in the reef where new corals can freely settle and grow in the future. The rate between free/occupied squares in at the beginning of the algorithm is denoted as and referred to as initial occupation factor. Each coral is labeled with an associated health function that corresponds to the problem’s objective function. The CRO is based on the fact that the reef will evolve and develop as long as healthier or stronger corals (which represent better solutions to the problem at hand) survive, while less healthy corals perish.
After the reef initialization described above, the phase of reef formation is artificially simulated. This phase consists of K iterations: at each of such iterations the corals’ reproduction in the reef is emulated by applying different operators and processes as described in Algorithm 1: a modeling of corals’ sexual reproduction (broadcast spawning and brooding).
After the reproduction stage, the set of formed larvae (namely, newly produced solutions to the problem) attempts to find a place on the reef to develop and further reproduce. This deployment may occur in a free space inside the reef (hole), or in an occupied location, by fighting against the coral currently settled in that place. If larvae are not successful in locating a place to settle after a number of attempts, they are considered as preyed by animals in the reef. The coral builds a new reef layer in every iteration.
The specific definition of the different operators that form the classical CRO algorithm is detailed here:

Sexual reproduction: The CRO model implements two different kinds of sexual reproduction: external and internal.

External sexual reproduction or broadcast spawning: the corals eject their gametes to the water, from which malefemale couples meet and combine together to produce a new larva by sexual crossover. In Nature, some species are able to combine their gametes to generate mixed polyps even though they are different from each other. In the CRO algorithm, external sexual reproduction is applied to a usually high fraction of the corals. The couple selection can be done uniformly at random or by resorting to any fitness proportionate selection approach (e.g. roulette wheel). In the original version of the CRO, standard crossover (one point or twopoints) are applied in the broadcast spawning process.

Internal sexual reproduction or brooding: CRO applies this method to a fraction of the corals in the reef. The brooding process consists of the formation of a coral larva by means of a random mutation of the broodingreproductive coral (selffertilization considering hermaphrodite corals). The produced larvae is then released out to the water in a similar fashion than that of the larvae generated through broadcast spawning.


Larvae settlement: once all larvae are formed at iteration through reproduction, they try to settle down and grow in the reef. Each larva will randomly attempt at setting in a square of the reef. If the location is empty (free space in the reef), the coral grows therein no matter the value of its health function. By contrast, if another coral is already occupying the square at hand, the new larva will set only if its health function is better than the fitness of the existing coral. A number of attempts for a larva to set in the reef is defined: after unsuccessful tries, it will not survive to following iteration.

Depredation: corals may die during the reef formation phase of the reef. At the end of each iteration, a small number of corals can be preyed, thus liberating space in the reef for the next iteration. The depredation operator is applied under a very small probability , and exclusively to a fraction of the worse health corals.
1.3 Structure of the thesis
The rest of this thesis is organized in two technical parts:

First, proposed contributions with numerical results in renewable energy problems is structured in two chapters: Ocean wave features prediction, and WPREs prediction.

The next part, proposed contributions with numerical results in facilities management is divided in two other chapters: Accurate estimation of with SVR and marine radar images, and efficient prediction of lowvisibility events at airports.
To conclude, some final remarks and future research lines are summarize in the last part of the document, with the list of publications shown in a final Appendix section.
Part II Proposed contributions with numerical results in renewable energy problems
Chapter 2 Ocean wave features prediction
2.1 Introduction
The exploitation of marine energy resources is currently a hot topic in renewable energy research, since they have shown a clear potential for sustainable growth [Defne2009, García2014, Lenee2011, López2013, Rusu2009, Rusu2012, Gonçalves2014]: marine energy resources do not generate CO, are potentially able to convert part of the huge energy of oceans into electricity [Arinaga2012, Esteban2012], and reduce oil imports, a crucial geoeconomical issue. However, in spite of this potential, the use of marine energy sources is nowadays still minor at global level. In spite of this, wave energy plays a key role for sustainable development in several offshore islands because it provides not only technical and economical benefits (to satisfy the demands of clean electricity) but also without significant environmental impact, a key concern in offshore islands, committed to the protection of ecological systems [Fadaeenejad2014]. Some interesting reviews of the most important issues involved in the generation of electricity from oceans (including converters, their related economical aspects, and the potential of a number of ocean regions to be exploited worldwide) can be found in [Bahaj2011, Chong2013, Kim2012, Hammar2012, Cuadra2016].
There are different technologies that can be considered within marine energy resources, including ocean wave, tidal and ocean thermal. This work is focused on wave energy, that uses WECs to convert ocean energy into electricity [Falcão2010, Cuadra2016]. WECs transform the kinetic energy of windgenerated waves into electricity by means of either the vertical oscillation of waves or the linear motion of waves, and exhibit some important advantages when compared to alternatives based on tidal converters. However, waves are more difficult to characterize than tides, because of their stochastic nature. As a consequence of this complexity, both the design, deployment, and control of WECs [Hong2014, Richter2013] become key topics that require a proper characterization and prediction of waves [Larsén2015, Reikard2015, Wimmer2006]. Maybe, the two most important wave parameters in this regard to characterize wave energy is the and the Wave Energy Flux (), in which prediction this chapter is focused on.
As mentioned, waves’ stochastic nature makes very difficult the prediction of wave energy resource, so the research work on this topic has been intense in the last few years. Focusing on ML approaches, one of the first algorithms proposed to predict is due to Deo et al. [Deo1998], who use ANN to obtain an accurate prediction of . Improvements on this prediction system were presented in a more recent work [Agrawal2004]. NN have also been applied to other problems of and prediction, such as [Tsai2002], where and are predicted from observed wave records using time series NN, [Castro2014], where a neural network is applied to estimate the wave energy resource in the northern coast of Spain, or [Zanaganeh2009], where a hybrid GAadaptive networkbased fuzzy inference system model was developed to forecast and the peak spectral period at Lake Michigan. Alternative proposals based on different approaches have been recently proposed like in [Mahjoobi2008], where different SC techniques are tested for prediction, [Mahjoobi2009] where a SVR methodology is considered, [Fernández2015] where different classifiers have been applied to analyze and predict and ranges in buoys for marine energy applications, [Nitsure2012], that propose the use of genetic programming for reconstruction problems or [Özger2011], where Fuzzy Logic (FL)based approaches were introduced for prediction problems.
In spite of this huge work dealing with ML algorithms in and prediction, there are not previous studies focussed on analyzing what are the best predictive variables to obtain an accurate prediction of these important parameters from neighbour buoys data. This problem is usually known in the ML community as FS [Weston2000], and it is an important task in supervised classification and regression problems. The reason for this is that irrelevant features, used as part of a training procedure in a classification or regression machine, can unnecessarily increase the cost and running time of a prediction system, as well as degrade its generalization performance [Blum1997, Salcedo2002]. In this thesis a novel hybrid GGA–ELM for accurate prediction of and values is proposed. The GGA is a recently proposed algorithm especially modified to tackle grouping problems [Falkenauer1992, Falkenauer1998]. In this case it is focussed on obtaining the best set of features (predictive variables) for the regressor machine (ELM). It will be shown how the GGA is able to produce different sets of good predictive variables for this problem, and how the ELM is able to obtain excellent or prediction from them. An experimental analysis of the proposed hybrid GGAELM approach in a real case of and prediction in buoys at the Western coast of the USA will be carried out. The application of alternative regression techniques such as SVR have been also analyzed in these experiments. Moreover, because of this hybrid prediction system has a number of parameters that may affect its final performance and they need to be previously specified by the practitioner, an automatic fine tuning of the prediction system’s parameters is added to the study. In this case, the parameters of GGAELM approach include the probability of mutation in the GGA or the number of neurons in the ELM hidden layer, among others. It is proposed then to use a Bayesian Optimization (BO) approach to automatically optimize the parameters of the whole prediction system (GGAELM), with the aim of improving its performance in wave energy prediction problems. BO has been shown to obtain good results in the task of obtaining good parameter values for prediction systems [Snoek2012].
The rest of the chapter is structured in the following parts: Section 2.2 where the calculation of and is done, Section 2.3 that presents the prediction system considered, Section 2.4 which addresses the explanation of the BO method, Section 2.5 that summarizes the experiments and results obtained and Section 2.6 that completes the study with some final remarks.
2.2 Wave energy resource: calculation of and
In the evaluation of wave energy deployment systems such as WECs or WECs arrays, it is essential to previously characterize as accurately as possible the amount of wave energy available in a particular location, given by parameters such as and . In order to obtain these parameters, note that the wave energy resource in a region is caused by both local and far winds blowing over the ocean surface, which transports the wave energy. Focusing thus the attention on the water surface, and within the framework of the linear wave theory, the vertical wave elevation, , at a point on the sea surface at time can be assumed as a superposition of different monochromatic wave components [Nieto2013, Goda2010]. This model is appropriate when the free wave components do not vary appreciably in space and time (that is, statistical temporal stationarity and spatial homogeneity can be assumed [Goda2010]).
In this model, the concept of “sea state” refers to the sea area and the time interval in which the statistical and spectral characteristics of the wave do not change considerably (statistical temporal stationarity and spatial homogeneity). The total energy of a sea state is the combined contribution of all energies from different sources. The “wind sea” occurs when the waves are caused by the energy transferred between the local wind and the free surface of the sea. The “swell” is the situation in which the waves have been generated by winds blowing on another far area (for instance, by storms), and propagate towards the region of observation. Usually, sea states are the composition of these two pure states, forming multimodal or mixed seas.
In a given sea state, the wave elevation with respect to the mean ocean level can be assumed as a zeromean Gaussian stochastic process, with statistical symmetry between wave maxima and minima. A buoy deployed at point can take samples of this process, , generating thus a time series of empirical vertical wave elevations. The Discrete Fourier Transform (DFT) of this sequence, using the Fast Fourier Transform (FFT) algorithm, allows for estimating the spectral density . Its spectral moments of order can be computed as follows:
(2.1) 
The is a first indicator of the amount of wave energy available in a given area. , or power density per meter of wave crest [Cahill2013] can be computed as
(2.2) 
where is the sea water density (1025 kg/m), the acceleration due to gravity, is the spectral estimation of the , and is an estimation of the mean wave period, normally known as the period of energy, which is used in the design of turbines for wave energy conversion. Expression (2.2) (with in meters and in seconds) leads to kW/m, and helps engineers estimate the amount of wave energy available when planning the deployment of WECs at a given location.
2.3 The hybrid prediction system considered
The prediction system is a hybrid wrapper approach, formed by the GGA (explained in depth in Section 1.2.2) for FS and the ELM to carry out the final prediction of or from a set of input data. The regressor chosen must be as accurate as possible, and also very fast in its training process, in order to avoid high computational burden for the complete algorithm. This is the main reason why the ELM is selected for the fitness function as well, and whose explanation is carried out in detail in Section 1.2.1. Since the ELM is hybridized with the GGA, there are different ways of calculating the final fitness associated with each individual. In this case the following fitness function is considered, that uses a measure of the Root Mean Squared Error (RMSE) of the prediction for the best group of features in the GGA:
(2.3) 
where stands for the or measured for sample , and stands for the or estimated by the ELM in the group of the individual with less error (best group of features), for sample . Note that stands for the number of training samples.
2.4 Bayesian optimization of the prediction system
Every ML algorithm or prediction system has its own set of parameters that must be adjusted to obtain an optimal performance. An example is a deep neural network in which one has to specify parameters such as the learning rate, the number of layers, the number of neurons in each layer, etc. [LeCun2015]. Another example is stochastic gradient boosting in which one has to choose the number of terminal nodes in the ensemble trees, the number of trees, the regularization parameter, etc. [Friedman2002]. In the particular setting in this study, in an ELM the number of units in the hidden layer has to be specified before training; and in the GA described in Section 1.2.2, the probability of mutation and the number of epochs must be known initially.
Changing the parameter values of a prediction system may have a strong impact in its performance. Parameter tuning is hence defined as the problem of finding the optimal parameter values of a prediction system on the problem considered. This task has traditionally been addressed by human experts, which often use prior knowledge to specify parameter values that are expected to perform well. However, such an approach can suffer from human bias. An alternative solution is to consider a grid or uniform search in the space of parameters to look for values that result in a good performance on a validation set. These methods, however, suffer when the dimensionality of the parameter space is very high [Bergstra2012], requiring a large number of parameter evaluations.
BO has emerged as practical tool for parameter selection in prediction systems. These methods provide an efficient alternative to a grid or uniform search of the parameter space [Snoek2012]. Assume that the surface defined by the error of a prediction system that depends on some parameters is smooth. In that case, a search through the parameter space according to a criterion that exploits this smoothness property and avoids exhaustive exploration can be done. More precisely, BO methods are very useful for optimizing blackbox objective functions that lack an analytical expression (which means no gradient information), are very expensive to evaluate, and in which the evaluations are potentially noisy [Mockus1978, Brochu2010, Shahriari2016]. The performance of a prediction system on a randomly chosen validation set, when seen as a function of the chosen parameters, has all these characteristics.
Consider a blackbox objective with noisy evaluations of the form , with some noise term. BO methods are very successful at reducing the number of evaluations of the objective function needed to solve the optimization problem. At each iteration of the optimization process, these methods fit a probabilistic model, typically a GP to the observations of objective function collected so far. The uncertainty about the objective function provided by the GP is then used to generate an acquisition function , whose value at each input location indicates the expected utility of evaluating there. The next point at which to evaluate the objective is the one that maximizes . Importantly, only depends on the probabilistic model and can hence be evaluated with very little cost. Thus, this function can be maximized very quickly using standard optimization techniques. This process is repeated until enough data about the objective has been collected. When this is the case, the GP predictive mean for can be optimized to find the solution of the optimization problem. Algorithm 2 shows the details of such a process.
The key for BO success is that evaluating the acquisition function is very cheap compared to the evaluation of the actual objective , which in this case requires retraining the prediction system. This is so because the acquisition function only depends on the GP predictive distribution for at a candidate point . Let the observed data until step of the algorithm be . The GP predictive distribution for is given by a Gaussian distribution characterized by a mean and a variance . These values are:
(2.4)  
(2.5) 
where is a vector with the objective values observed so far; is a vector with the prior covariances between and each ; is a matrix with the prior covariances among each , for ; and is the prior variance at the candidate location . All these quantities are obtained from a covariance function which is prespecified and receives as an input two points, and , at which the covariance between and has to be evaluated. A typical covariance function employed for BO is the Matérn function [Snoek2012]. For further details about GPs the reader is referred to [Rasmussen2006].
Thus, BO methods typically look for the best position very carefully to evaluate next the objective function with the aim of finding its optimum with the smallest number of evaluations. This is a very useful strategy when the objective function is very expensive to evaluate and it can save a lot of computational time. Three steps of the BO optimization process are illustrated graphically in Figure 2.1 for a toy minimization problem.
Figure 2.1 shows a GP estimation of the objective over three iterations. The acquisition function is shown in the lower part of the plot. The acquisition is high where the GP predicts a low objective and where the uncertainty is high. Those regions in which it is unlikely to find the global minimum of have low acquisition values and will not be explored.
Unlike BO methods, grid or uniform search strategies are based in a pure exploration of the search space. If the assumption that the objective function is smooth is made, doing a few evaluations in regions of the input space that look more promising (exploitation) is expected to give better results. In BO methods the acquisition function balances between exploration and exploitation in an automatic way. An example of an acquisition function is Expected Improvement (EI) [Jones1998]. EI is obtained as the expected value under the GP predictive distribution for , of the utility function , where is the best value observed so far. That is, EI measures on average how much the current best solution by evaluating the objective at each candidate point will be improved on. An advantage of EI is that the corresponding acquisition function can be computed analytically: , where and and are respectively the c.d.f. and p.d.f. of a standard Gaussian. EI is the acquisition function displayed in Figure 2.1.
BO has been recently applied with success in different prediction systems for finding good parameter values. For example, it has been used to find the parameters of topic models based on latent Dirichlet allocation, Support Vector Machine (SVM), or deep convolutional NN [Snoek2012]. Furthermore, BO methods have also been used to optimize a logistic regression model for labelling Amazon product reviews [Dewancker2016], or to optimize the weights of a neural network to balance vertical poles and lengths on a moving cart [Frean2008]. Another applications of BO are found in the field of environmental monitoring, in the task of adjusting the parameters of a control system for robotics, in the optimization of recommender systems, and in combinatorial optimization [Brochu2010, Shahriari2016]. Finally, BO methods has been implemented in different software packages. An implementation in python is called Spearmint and is available at [Github], which is the BO implementation used in this work.
2.5 Experiments and results
This section describes some experiments with the aim of showing the improvements obtained in the performance of the prediction system when its parameters are optimized with the Bayesian techniques introduced before. A real problem of prediction ( kW/m, [Goda2010]) from marine buoys is considered. Figure 2.2 shows the three buoys considered in this study at the Western coast of the USA, whose data bases are obtained from [NOAA2016], and their main characteristics are shown in Table 2.1. The objective of the problem is to carry out the reconstruction of buoy 46069 from a number of predictive variables from the other two buoys. Thus, 10 predictive variables measured at each neighbor buoy are considered (a total of 20 predictive variables to carry out the reconstruction). Table 2.2 shows details of the predictive variables for this problem. Data for two complete years (1st January 2009 to 31st December 2010) are used, since complete data (without missing values in predictive and objective ) are available for that period in the three buoys. These data are divided into training set (year 2009) and test set (year 2010) to evaluate the performance of the proposed algorithm.
Characteristics  Station 46025  Station 46042  Station 46069 

3344’58"N 1193’10"W  3647’29"N 12227’6"W  3340’28"N 12012’42"W  
Site elevation  sea level  sea level  sea level 
Air temp height  4 m above site elevation  4 m above site elevation  4 m above site elevation 
Anemometer height  5 m above site elevation  5 m above site elevation  5 m above site elevation 
Barometer elevation  sea level  sea level  sea level 
Sea temp depth  0.6 m below water line  0.6 m below water line  0.6 m below water line 
Water depth  905.3 m  2098 m  1020.2 m 
Watch circle radius  1327 yards  2108 yards  1799 yards 
Acronym  Predictive  units 

variable  
WDIR  Wind direction  [degrees] 
WSPD  Wind speed  [m/s] 
GST  Gust speed  [m/s] 
WVHT  Significant wave height  [m] 
DPD  Dominant wave period  [sec] 
APD  Average period  [sec] 
MWD  Direction DPD  [degrees] 
PRES  Atmospheric pressure  [hPa] 
ATMP  Air temperature  [Celsius] 
WTMP  water temperature  [Celsius] 
This experimental section is divided into two different subsections. First, the performance of the BO techniques proposed in the optimization of the specific GGAELM prediction algorithm is shown. Second, it will presented how the prediction performance is improved when the system is run with the parameters obtained by the BO techniques, i.e. by comparing the performance of the system before and after tuning the parameters with BO.
2.5.1 Methodology
The utility of the BO techniques described in Section 2.4, for finding good parameters for the prediction system described in Section 2.3, will be evaluated. More precisely, the parameters that minimize the RMSE of the best individual found by the GGA on a validation set that contains of the total data available will be tried to find. The parameters of the GGA that are adjusted are the probability of mutation , the percentage of confrontation in the tournament , and the number of epochs . On the other hand, the parameters of the ELM that is used to evaluate the fitness in the GGA are also adjusted. These parameters are the number of hidden units and the logarithm of the regularization constant of a ridge regression estimator, that is used to find the weights of the output layer . Note that a ridge regression estimator for the output layer weights allows for a more flexible model than the standard ELM, as the standard ELM is retrieved when is negative and large [Albert1972].
The BO method is compared with two techniques. The first technique is a random exploration of the space of parameters. The second technique is a configuration specified by a human expert. Namely, , , , and . These are reasonable values that are expected to perform well in the specific application tackled. The computational budget to different parameter evaluations is set for both the BO and the random exploration strategy. After each evaluation, the performance of the best solution found is reported. The experiments are repeated for different random seeds and average results are informed. All BO experiments are carried out using the acquisition function EI and the software for BO Spearmint.
2.5.2 Results I: Bayesian optimization of the wave energy prediction system parameters
Figures 2.3 and 2.4 show the average results obtained and the corresponding error bars for the task of predicting the and the task of predicting the wave height, respectively. Each figure shows the average RMSE of each method (BO and random exploration) on the validation set as a function of the number of configurations evaluated. The performance of the configuration specified by a human expert is also shown. It can be observed that the BO strategy performs best in each setting. In particular, after a few evaluations the BO method is able to outperform the results of the human expert and it provides results that are similar or better than the ones obtained by the random exploration strategy with a smaller number of evaluations.
2.5.3 Results II: Estimation of the generalization performance
In a second round of experiments, the performance of the proposed prediction system after its optimization with the BO methodology is shown. Note that, after the FS process with the GGAELM approach, an ELM and a SVR [Smola2004, Salcedo2014b] to obtain the final prediction of the and the are used.
Table 2.3 shows the results obtained for the experiments carried out. It can be observed the comparison between ELM and SVR approaches in different scenarios: the prediction obtained with all the features, the prediction obtained with the hybrid algorithm GGAELM (without BO methodology), and finally the prediction acquired after the application of the BO process in the GGAELM approach. As Table 2.3 summarizes, it is easy to see how the hybrid GGAELM algorithm improves the results obtained by the ELM and SVR approaches (without FS). In fact, the SVR algorithm improves the values of the Pearson’s Correlation Coefficient () around 75% in the case of the FS method, against the poor 31% when all features are used. Moreover, these results are improved by means of the BO methodology, using ELM and SVR approaches after the GGAELM. In the case of the ELM, values of the around 77% against the 71% achieved with the GGAELM algorithm without the BO improvement are obtained. The same behavior is get for the SVR algorithm: values around 78% with the application of the BO methodology against the 75% obtained for the GGAELM approach when the parameters are fixed by a human expert. In addition, the reader can comparer the results with other measurement of the accuracy, the Mean Absolute Error (MAE).
Experiments  RMSE  MAE  

All featuresELM  3.4183 kW/m  2.4265 kW/m  0.6243 
All featuresSVR  4.4419 kW/m  2.8993 kW/m  0.3129 
GGAELMELM  2.8739 kW/m  1.8715 kW/m  0.7101 
GGAELMSVR  2.6626 kW/m  1.6941 kW/m  0.7548 
BOGGAELMELM  2.5672 kW/m  1.7596 kW/m  0.7722 
BOGGAELMSVR  2.4892 kW/m  1.6589 kW/m  0.7823 
The results of the previous tables can be better visualized in the following graphics. In Figure 2.5 the temporary predictions carried out by the ELM and SVR approaches are shown. It can be seen how the cases (c) and (d) improve the approximation to the real values against the cases (a) and (b) where the BO methodology is not applied. The same situation can be seen in Figure 2.6, where the scatter plots are presented for the results obtained with and without the BO methodology.








The same procedure is carried out in the case of the . Table 2.4 compares the results obtained in the different experiments. As it can be seen, the results are improved with the use of the BO methodology with values of the around 74% for the ELM and SVR predictions, against the 66% and 39% achieved for the ELM and SVR, respectively, with all features. The GGAELM algorithm improves these last results, but they are not so good like when the BO methodology is used. In Figures 2.7 the temporary predictions for the GGAELMELM, GGAELMSVR, BOGGAELMELM and BOGGAELMSVR are shown. The same is done for the scatter plots, whose Figures 2.8, present the results mentioned above.
Experiments  RMSE  MAE  

All featuresELM  0.4653 m  0.3582 m  0.6624 
All featuresSVR  0.6519 m  0.4986 m  0.3949 
GGAELMELM  0.3650 m  0.2858 m  0.7049 
GGAELMSVR  0.3599 m  0.2727 m  0.7056 
BOGGAELMELM  0.3324 m  0.2519 m  0.7429 
BOGGAELMSVR  0.3331 m  0.2461 m  0.7396 








In both predictions ( and ) the BO methodology improves the results, for this reason the generality of the proposed method can be highlighted.
2.6 Conclusions
In this paper it has been shown how a hybrid prediction system for wave energy prediction can be improved by means of BO methodology. The prediction system is formed by a grouping GA for FS, and an ELM for effective prediction of the target variable, the and the in this case. After this FS process, the final prediction of the target is obtained by means of an ELM or a SVR approach. The paper describes in detail the BO methodology, and its specific application in the optimization of the GGAELM for a real problem of and prediction from buoys data in Western California USA. The results show that the BO methodology is able to improve the performance of the system, i.e., the prediction of the optimized system is significantly better than that of the system without the BO methodology applied. This improvement is related to the optimal selection of parameters carried out by the BO strategy. On the other hand, the main limitation of the proposed methodology is the increase in computation time. Nevertheless, this increase only affects the training phase and not the operation phase, in which predictions are made after training. Therefore, this limitation is not very important. Finally, note that this methodology can be extended to alternative prediction systems and other problems, specially to hybrid approaches involving ML algorithms with a high number of parameters to be tuned.
Chapter 3 Wind power ramps events prediction
3.1 Introduction
Wind Power Ramp Events (WPREs) are large fluctuations of wind power in a short time interval, which lead to strong, undesirable variations in the electric power produced by a wind farm. Its accurate prediction is important in the effort of efficiently integrating wind energy in the electric system, without affecting considerably its stability, robustness and resilience. In this study, the problem of predicting WPREs by applying ML regression techniques is tackled. The proposed approach consists of using variables from atmospheric reanalysis data as predictive inputs for the learning machine, which opens the possibility of hybridizing numericalphysical weather models with ML techniques for WPREs prediction in real systems. Specifically, the feasibility of a number of stateoftheart ML regression techniques are explored, such as SVR, ANN (MLPs and ELMs) and GPs to solve the problem. Furthermore, the ERAInterim reanalysis from the European Center for MediumRange Weather Forecasts is the one used in this work because of its accuracy and high resolution (in both spatial and temporal domains). Aiming at validating the feasibility of this predicting approach, an extensive experimental work using real data from three wind farms in Spain is carried out, discussing the performance of the different ML regression tested in this wind power ramp event prediction problem.
3.1.1 Motivation
Wind power is currently one of the most important renewable energies in the world [Kumar2016] in terms of penetration in the electric power system [Brenna2017, Mohagheghi2017], economic impact and annual growth rate [Ali2017], both onshore [Dai2016] and offshore [Colmenar2016]. Electric power generation is usually carried out in large wind farms [Giebel2016, Herbert2014] far from urban centers [Lunney2017, Jangid2016], though, in the last few years, urban wind power generation is also gaining impulse [Simões2016], including its use in smart grids [Köktürk2017].
The counterpart of the benefits associated with the flourishing of wind energy throughout the world—mainly the reduction of CO emission, one of the causes of global warming [Peters22013] and climate change [Bauer2015]—are problems related not only to the maintenance and management of wind farm facilities, but also to those of power grids. Regarding this, one of the most important problems yet to be solved is the efficient integration [Jones2017] of an increasing number of wind energy generators in both the distribution and transmission power grids, which are becoming increasingly complex [Cuadra2017, Cuadra2015]. Such an intrinsically complex nature of power grids is further increased because of the inherent stochastic nature of wind energy [Yan2015] that, depending on the weather conditions, can lead to intermittent generation [Yan2015]. This can affect the stability, robustness and resilience [Cuadra2017, Cuadra2015] of electric power grids. A useful discussion of the technical differences between these interrelated, but distinct concepts can be found in [Cuadra2015].
Aiming at preserving grid stability in a scenario with a high percentage of intermittent renewable sources—not only wind energy [Colmenar2016], but also photovoltaic [Cabrera2016] and wave [Cuadra2016] energies—power grids need to be made more flexible [Kroposki2017]. In this effort, the emerging technologies associated with smart grids [Köktürk2017] and microgrids [Yoldaş2017] can be used to mitigate wind power intermittency. An illustrative, very recent proposal in this respect consists of increasing the penetration of VehicletoGrid (V2G) technologies [Gough2017] to use the batteries of idle Electric Vehicles (EV) as power storage units [Zhao2017], absorbing peaks of intermittent overproduction.
Wind power intermittency and its influence on power grids’ stability and performance are the main reasons why Wind Power Forecasting (WPF) [Renani2016, Tascikaraoglu2014] is a key factor to improve its integration without unbalancing the rest of the grid components. Among the different issues in wind power prediction, one of the most significant is the existence of Wind Power Ramp Events. WPREs consist of large fluctuations of wind power in a short period of time, leading to a significant increasing or decreasing of the electric power generated in a wind farm [Zhang2017, Gallego2015a].
The field of scientific research in WPREs’ prediction (or forecasting) [Ouyang2013] is a relatively recent topic driven by the need for improving the management of quick and large variations in wind power output, particularly in the aforementioned context of power grids with high renewable penetration [Alizadeh2016]. A useful review of different WPREs’ definitions (in which there does not seem to be a clear consensus) and their types (increasing or decreasing, depending on the WPRE definition) can be found in [Ferreira2011]. Among them, WPREs’ severity is one of the important issues. Up and down WPREs can exhibit different fluctuating levels of severity, although down WPREs are usually more critical than up WPREs because of the availability of reserves [Zhang2017]. WPREs are usually caused by specific meteorological processes—basically, crossing fronts [Gallego2015b] and fast changes in the local wind direction—and they involve at several scales (synoptic [Ohba2016], mesoscale [Salcedo2009] and microscale). Surprisingly, it has been found recently that very large offshore farms, clustered together, can also generate large WPREs on time scales of less than 6 h [Drew2017]. This gives an idea of the complexity of the WPRE phenomenon.
WPREs’ prediction is not only important for power grid operators, but also for wind farm owners. In fact, the occurrence of WPREs in wind farms is critical not only because of the aforementioned undesired variations of power, but also due to their potential harmful effects in wind turbines, which leads to an increase of management costs associated with these facilities [Cui2015]. Regarding this, the accurate prediction of WPREs has been reported as an effective method to mitigate the economic impact of these events in wind generation power plants [Gallego2015a, Cui2015].
According to [Cui2015, Foley2012], the prediction of WPREs and their influence on electricity generation and grid stability have been recently tackled by using two major families of techniques: (1) “physicalbased” models (or numerical approaches aiming to tackle the complexity of the physical equations, which rule the atmosphere to obtain a prediction); and (2) statistical approaches (usually datadriven models to obtain predictions). The first group of techniques, the physicalbased approaches, include a set of equations that rules the atmospheric processes and their evolution over time and, because of their complexity and nonlinearity, are tackled by means of numerical methods. The second group of WRPE predicting techniques, the statistical approaches, are datadriven methods that are based on wind time series and include a variety of techniques ranging from conventional approaches—for instance, AutoregressiveMovingAverage (ARMA)—to Computational Intelligence (CI) approaches [Salcedo2016]. These are physicsinspired metaheuristics [Salcedo2016] able to find approximate solutions to complex problems that otherwise could not be solved or would require very long computational time. They include, among others, three groups of bioinspired techniques such as EC [DeJong2006], NC [Ata2015] and FC [Suganthi2015]. An introduction to the main concepts of bioinspired CI techniques in energy applications can be found in [Cuadra2016, Salcedo2015b].
3.1.2 Purpose and Contributions
The purpose of this work is to explore the feasibility of a novel hybrid WPRE prediction framework, which merges parts of numericalphysical models with stateoftheart statistical approaches. When the term “hybrid algorithms” is used in this work, that means that this proposal combines data from numericalphysical methods (reanalysis, in this case) with ML approaches (specifically, regressors). Regarding what the hybrid approach means in this study, there are two points to note. The first one is that it would be possible to adapt the proposed regression techniques to operate with alternative data (not coming from numerical methods, reanalysis, in this study). The second one, which is the main novelty of this work, is that the use of data from numericalphysical methods could help achieve valuable prediction of WPREs in wind farms.
The contributions of this work are:

The use of regression techniques in this kind of problem since, up until now, the majority of WPRE prediction frameworks have been based on classification approaches.

The use of reanalysis data as predictive variables of the ML regression techniques. As will be shown, this is because the direct application of regression algorithms makes unnecessary the use of some preprocessing algorithms, which are necessary in other approaches [Dorado2017a, Cornejo2017]. Note that the classification problems associated with WPREs are usually highly unbalanced, which makes it difficult to put into practice highperformance classification techniques without having to use specific oversampling or similar techniques [Dorado2017a, Cornejo2017].

The performance of the proposed system has been tested using real data from three different wind farms in Spain.
The rest of this chapter is organized as follows: Section 3.2 states the problem definition we tackle in this work, in which the WPRE prediction is formulated as a regression task. Section 5.1 presents the data and predictive variables involved in our proposal. In turn, Section 5.4 shows the experimental work carried out, these results being obtained by the different tested algorithms in three WPRE prediction problems located at three distinct wind farms in Spain. Sections 3.4.2 and 3.5 complete the study by giving some final concluding remarks on the work carried out.
3.2 Problem Definition
Following previous works in the literature [Gallego2015a, Dorado2017a, Cornejo2017, Dorado2017b], a WPRE can be characterized by a number of parameters:

Magnitude (): defined as the variation in power produced in the wind farm or wind turbine during the ramp event (subscript “”).

Duration (): time period during which the ramp event is produced.
In addition to the magnitude and duration of a wind ramp, the derived quantity called the ramp rate () is used to define the intensity of the ramp.
Taking these parameters into account, in the majority of previous works in the literature, the WPRE detection problem has been defined as a classification problem [Bossavy2015]. Within this framework, let be the socalled ramp function, i.e., a criterion function that is usually evaluated to decide whether or not there is a WPRE. There are several definitions of , all of them involving power production () criteria at the wind farm (or wind turbine), but the two more common ones are the following [Gallego2015a]:
(3.1) 
(3.2) 
Note that, in the ramp function stated by Equation (3.1), the power variation is referred to a given time interval . In the experimental work carried out throughout this work, such a time interval has been assumed to be h (the “reference time interval”) because of the reanalysis resolution.
Using any of these definitions of the ramp function , the classification problem can be stated by defining a threshold value , in the way:
(3.3) 
where is an “indicator function” to be used to label the data in the binary classification formulation of the problem.
As will be shown later on, in this approach, first of all, the threshold value is set, and then, a WPRE is detected if the ramp function is larger than 50% of . It is worth mentioning that, if there is an interest in establishing a larger number of cases (for example, five classes of WPRE), it would need at least two thresholds to do so.
The WPRE detection problem also involves a vector of predictive variables . Different types of inputs have been used as predictive variables in the literature. The key point here is that the meteorological process must be always considered, since they are physical precursors of WPREs. Different numerical weather prediction system outputs have been used to obtain these predictive variables, including reanalysis data [Gallego2015b]. This provides a long history record of meteorological variables to be used as predictive variables for WPRE prediction. Following these previous works, in this paper, the following version of the WPRE prediction problem is tackled:
Let (with ) be time series of predictive variables and values of the ramp function (objective variables). The problem consists of training a regression model in a subset of (training set), in such a way that, when is applied to a given test set , an error measure is minimized.
3.3 Data and Predictive Variables
A reanalysis project is a methodology carried out by some weather forecasting centers, which consists of combining past observations with a modern meteorological forecast model, in order to produce regular gridded datasets of many atmospheric and oceanic variables, with a temporal resolution of a few hours. Reanalysis projects usually extend over several decades and cover the entire planet, being a very useful tool for obtaining a comprehensive picture of the state of the Earth system, which can be used for meteorological and climatological studies. There are several reanalysis projects currently in operation, but one of the most important is the ERAInterim reanalysis project, which is the latest global atmospheric reanalysis produced by the ECMWF [Dee2011]. ERAInterim is a global atmospheric reanalysis from 1979, continuously updated in real time. The data assimilation system used to produce ERAInterim is based on a 2006 release that includes a fourDimensional Variational analysis (4DVar) with a 12h analysis window. The spatial resolution of the dataset is approximately 15 km, on 60 vertical levels from the surface up to 0.1 hPa. ERAInterim provides sixhourly atmospheric fields on model levels, pressure levels, potential temperature and potential vorticity and threehourly surface fields.
Aiming to tackle the WPRE prediction problem in this study, wind and temperaturerelated predictive variables is considered from ERAInterim at some specific points in the neighborhood of the area under study. The variables considered as predictors (Table 3.1) are taken at different pressure levels (surface, 850 hPa and 500 hPa), in such a way that different atmospheric processes can be taken into account. A total of 12 prediction variables per ERAInterim node and four nodes surrounding the area under study (wind farm) are considered at time , i.e., in this problem, is formed by predictive variables. The ERAInterim time resolution for the predictive variables (6 h) sets in this case the ramp duration taken into account ().
Thus, each regression model analyzed in this work () must be trained with the data or , where and are computed using Equations (3.1) and (3.2), respectively.
Variable Name  ERAInterim Variable 

skt  surface temperature 
sp  surface pression 
zonal wind component () at 10 m  
meridional wind component () at 10 m  
temp1  temperature at 500 hPa 
up1  zonal wind component () at 500 hPa 
vp1  meridional wind component () 500 hPa 
wp1  vertical wind component () at 500 hPa 
temp2  temperature at 850 hPa 
up2  zonal wind component () at 850 hPa 
vp2  meridional wind component () at 850 hPa 
wp2  vertical wind component () at 850 hPa 
3.4 Experimental Work
This section presents the experimental evaluation of the proposed approach in a real problem of WPRE prediction, by exploring the different ML regressors used in this work (SVR, ELM, GP and MLP). Prior to describing the experiments carried out, it is worth emphasizing the practical importance of using reanalysis data to test the accuracy and feasibility of the proposed hybrid approach with ML regressors. Nonhybrid approaches (the use of regression techniques in other alternative data, from measuring stations, for example) is also possible. However, note that, from the viewpoint of the repeatability of the experiments, reanalysis data are very convenient since they are freely available on the Internet, so that the experimental part of this work can be easily reproduced by other researchers.
Starting with the detailed description of the experimental work carried out, three wind farms are considered in Spain, whose locations have been represented in Figure 3.1. The three wind farms chosen (labeled “A”, “B” and “C” in Figure 3.1) are mediumsized facilities, with 32, 28 and 30 turbines installed, respectively. Note that the wind farms selected cover different parts of Spain, north, center and south, characterized by different wind regimes. Different numbers of data were available for each wind farm: in wind farm “A”, data ranges 11/01/2002–29/10/2012, while in wind farm “B” ranges 23/11/2000–17/02/2013. In wind farm “C”, the data used are between 02/03/2002 and 30/06/2013.
A preprocessing step to remove missing and corrupted data was carried out. Note that data every 6 h (00 h, 06 h, 12 h and 18 h) is only kept, to match the predictive variables from the ERAInterim to the objective variables.
The performance of the four ML regressors described in Section 1.2, in WPREs prediction problems at each wind farm is shown in terms of different error measurements (), such as RMSE, MAE or “sensitivity”, , also called the true positive rate. This last measure is defined as:
(3.4) 
where: (1) stands for the number of positive predictions, i.e., the correct predictions of ascending (), descending () and no ramps (with the definition), and ramps or no ramps (with the definition) values in the experiments; (2) stands for the number of positive values in the test, i.e., the total real values of positive ramps, negative ramps, ramps or no ramps in the database. Note that this way, the experiments are performed with the two different definitions of the ramp function ( and ) given in Section 3.2.
The following step to obtain the prediction of the WPREs is to train the considered ML regressors. A partition of the data into training (80%), and test (20%) sets is carried out. In the case of the SVR and MLP, a validation set from the training (5%) set is also considered. This validation set is used to obtain the best SVR hyperparameters , and , by means of a GS [Smola2004]. The validation set is also used in the training of the MLP approach, in order to prevent the NN from overtraining. Both training and test sets have been randomly constructed from the available data after the cleaning preprocessing. The concrete configurations and the values used for the parameters of the considered ML regression models, , are listed in Table 3.2.
With all these previous considerations in mind, Sections 3.4.1 and 3.4.2, focus on showing the results obtained and on discussing them, respectively.
Model  Model Configuration  Values Used in the Design Parameters for Each Model  
SVR  SVR with Gaussian kernel 


ELM 



GP  RBF kernel 


MLP  Levenberg–Marquardt training 

3.4.1 Results
As mentioned in the description of the problem at hand, among the several definitions of ramp functions, , the most common ones are considered [Gallego2015a], stated, respectively, by Equations (3.1) and (3.2), because both include power production criteria () at the wind farm. The variation of power caused by a wind ramp, , has been studied in the experiments below in the three wind farms (Figure 3.1) within a time interval h, which is determined by the resolution of the reanalysis data.
In addition, in order to properly understand the analysis of the results obtained, it is convenient to point out that, by using the indicator function stated by Equation (3.3), the proposed methodology is able to successfully detect those WPREs that surpass the thresholds ( or ), when using the ramp function definition, or the single threshold (), when using the definition. As will be shown later on, this is due to the fact that, with the first ramp definition (), it can be detected three types of events: ascending ramps (which are those whose power exceeds ), descending ramps (those surpassing ) and the existence of “no ramps” (when the generated electric power is in between the two thresholds). Conversely, in the case of using the ramp function definition, it is only necessary to determine whether or not there is a ramp, so that only a threshold is necessary.
Taking these considerations into account and aiming at better explaining the results, the discussion is organized according to the objective function used, either or , leading to Sections 3.4.1 and 3.4.1, respectively.
Results using as the Ramp Function Definition
Table 3.3 shows the results obtained in this problem of WPRE prediction when considering as the objective function, in the three aforementioned wind farms in Spain (labeled “A”, “B” and “C” in Figure 3.1). For each wind farm, the performance of any of the ML regressors explored (SVR, ELM, GP and MLP) has been measured using the metrics RMSE, MAE and sensitivity ( (ramp), (ramp), (no ramp)).
Wind Farm A  
ML regressor  RMSE  MAE  (+ramp)  (ramp)  (no ramp) 
(MW)  (MW)  (%)  (%)  (%)  
SVR  7.0085  5.2673  26.93  24.20  96.59 
ELM  5.6779  4.2499  40.54  42.59  95.51 
GP  5.3066  3.9519  54.93  51.95  93.96 
MLP  5.4538  4.0021  12.13  5.72  99.41 
Wind Farm B  
ML regressor  RMSE  MAE  (+ramp)  (ramp)  (no ramp) 
(MW)  (MW)  (%)  (%)  (%)  
SVR  8.0025  5.9773  35.53  34.10  86.66 
ELM  7.4539  5.9768  32.93  33.14  92.13 
GP  5.9856  4.4298  52.10  58.25  91.71 
MLP  5.9009  4.3429  15.11  13.14  97.25 
Wind Farm C  
ML regressor  RMSE  MAE  (+ramp)  (ramp)  (no ramp) 
(MW)  (MW)  (%)  (%)  (%)  
SVR  7.1370  5.3406  45.38  44.20  91.33 
ELM  5.8367  4.4462  50.32  47.64  94.01 
GP  4.7515  3.4771  57.14  61.05  93.99 
MLP  5.0727  3.6827  14.21  10.26  98.52 
Regarding the reasons why the mentioned metrics are used to the detriment of others, it is convenient to stress some aspects related to what, in fact, are two conceptually distinct groups of measures: metrics that measure errors (RMSE and MAE), on the one hand, and metrics that quantify success prediction rates (sensitivity), on the other. These facets to be highlighted are:

With respect to the “conventional” metrics that measure errors, there are two reason that have compelled us to include the RMSE and MAE metrics. The first one is that they are the most commonly used in the literature. Examples of relevant papers in which these metrics are used for WPRE forecasting are [Gallego2015a, Cutler2007, Gallego2011, Gallego2013]. Please see [Gallego2015a] for a useful discussion on this issue. The second cause is, as will be shown, that the utility of these error measures can be complemented by using the sensitivity metric, the other class of metrics that are chosen.

The second couple of points that are important to be emphasized here are just those related to the aforementioned sensibility in Equation (3.4), one with respect to its meaning and the other regarding its application. On the one hand, the physical meaning of sensitivity is just the percentage of correct ramp predictions with respect to actual measured data. Despite its apparent simplicity, this is, however, an excellent measure of the extent to which the regressor algorithm under test is efficient in detecting wind ramps. On the other hand, regarding its application step in the proposed methodology, the key point is that sensitivity is only used after having predicted the ramp function with a regression technique and a threshold has been defined. After applying the threshold, the number of real WPREs is thus obtained and compared to the predicted number. This way, the fact that the problem is highly unbalanced is not an issue any longer; or, in other words, the regression techniques are applied to the ramp function, and then, a threshold to classify events is established. In this case, the percentage of correct WPRE identifications is obtained. Note that the work’s objective is to deal with a regression problem, it is enough to show the good percentage of correct classification after the threshold setting in the predicted ramp function.
The analysis of Table 3.3 allows for elucidating some interesting conclusions:

The performance of the ML regressors is, in general, good in terms of RMSE, MAE and sensitivity , although, as shown, there are some ML regressors that work better than others.

Regarding the performance of one regressor with respect to that of another, the results of Table 3.3 clearly indicate that the GP model reaches the best results of all the regressors tested, with an excellent reconstruction of the ramp function from the ERAInterim variables. Note in Table 3.3 that the values of the metrics obtained by the GP regressor are marked in bold. Its RMSE and MAE values are much lower (better) than those of the other ML regressors explored. In terms of sensitivity, its performance is even better. Specifically, its sensitivity (or percentage of correct predictions (with respect to the real, measured data) stated by Equation (3.4)) is much higher (better) than those of the other regressors: (+ramp) (+ramp) (for ascending ramps) and (ramp) (ramp) (for descending ramps). This confirms the validity of the results measured with the error metrics and proves the feasibility of the proposed methodology for predicting wind ramps, both ascending and descending ramps.

The worst result corresponds to the MLP, with a poorer detection of positive WPREs, when compared to the other ML regressors.

The SVR and ELM work well in between both GP and MLP, with acceptable values of detection in positive WPREs.
With this analysis in mind, Figures 3.2–3.4 show the estimation of obtained by the GP and ELM algorithms (the two best approaches tested in the experiments), when using as the objective function, for the wind farms A, B and C, respectively. Some aspects to correctly interpret these figures are:

Aiming at clearly showing the algorithms’ performance, only the 300 first samples of the test set have been represented in these figures.

Furthermore, a threshold value (and the corresponding ) has been marked in these figures, so it can be used to decide whether or not the event is a ramp power event (see Equation (3.3)). When a ramp occurs, it is possible to decide whether the ramp event is ascending or descending.
The results illustrated in Figure 3.2 (a) show two data series: the series of real measured WPRE (red ) and the series of predicted WPRE (blue ) values computed by using the proposed hybrid methodology. In the effort to better explain the results and the applicability of this proposal, Figure 3.2 is drawn in a more detailed way than the others, zooming into two shorter time excerpts, b and c. The insets b and c show how there are some WPREs that surpass any of the thresholds and . Specifically, and as mentioned before, a WPRE is detected in this approach if the ramp function is larger than 50% of . Note that Figure 3.2b,c show how the predicted WPREs (blue ) exceeding any thresholds ( or ) are correctly predicted when compared to the real, measured WPRE (red ).
Regarding such a threshold value, it is worth mentioning that is not used until the very end of the experiments, once the ramp function has been predicted with the ML regression algorithms. In this respect, it is also convenient to remark that, in the proposed approach, it does not look to optimize . Only is displayed as an indication (example) that the ML regression model applied can be turned into a classification for WPRE. Note, however, that the purpose of this study is to deal with it as a regression problem.
The good performance observed in Figure 3.2 for the ELM is common (and even better) to those illustrated in Figures 3.3 and 3.4.
The joint analysis of both Figures 3.2 and 3.4 and Table 3.3 reveals the suitable throughput of the ML regression techniques (mainly the GP model), which hybridized with the ERAInterim predictive values, assist in obtaining a robust decision system in terms of the existence or not of a power ramp, depending, of course, on the definition of the threshold .
Results using as the Ramp Function Definition
On the other hand, Table 3.4 and Figures 3.5–3.7 will assist us to explain the results when is the ramp function to be predicted.
Wind Farm A  
ML regressor  RMSE  MAE  (ramp)  (no ramp) 
(MW)  (MW)  (%)  (%)  
SVR  6.8847  5.1876  31.33  96.27 
ELM  5.7037  4.2925  41.99  95.01 
GP  5.2048  3.7897  49.66  96.36 
MLP  5.4351  3.9861  8.71  99.44 
Wind Farm B  
ML regressor  RMSE  MAE  (ramp)  (no ramp) 
(MW)  (MW)  (%)  (%)  
SVR  7.9439  5.8853  44.16  85.55 
ELM  7.3148  5.8675  34.67  93.23 
GP  5.9223  4.4037  65.32  84.12 
MLP  5.9051  4.3475  14.76  97.17 
Wind Farm C  
ML regressor  RMSE  MAE  (ramp)  (no ramp) 
(MW)  (MW)  (%)  (%)  
SVR  7.1525  5.4677  37.88  93.83 
ELM  5.8624  4.4368  58.16  92.10 
GP  5.1030  3.6991  57.26  94.42 
MLP  5.0605  3.6670  11.22  98.56 
Table 3.4 presents the results (in terms of RMSE, MAE and sensitivity) corresponding to the estimation of the ramp function (Expression (3.2)) achieved by using the proposed approach as a function of the ML regressors explored (SVR, ELM, GP and MLP).
A first aspect that stands out of Table 3.4 is that it has fewer columns related to sensitivity than those of Table 3.3. This is an interesting points that arises from the different definitions of the ramp function , either or . Note that, for definition , the sensitivity is the percentage of correctly predicted results (either ramp or no ramp) with respect to the actual measured data. This is the reason why has only two columns in Table 3.4, (ramp) and (no ramp), whereas Table 3.3 exhibits three related columns. This is because, in the case of the ramp definition, there are three events to be detected: ascending ramp (), descending ramp () and no ramps.
In the same way as Table 3.3, Table 3.4 also reveals that, for , the GP approach exhibits the best results, outperforming clearly the rest of the ML regressors tested, except the MLP. This has similar values only in its error metric, RMSE and MAE, but not in its (ramp) value, which is considerably worse than that of the GP. This is clear, for instance, in Wind Farm A, in which RMSE MW, less than that of the other regressors. Note that (ramp) (ramp). In Wind Farm B, the performance of the GP (RMSE MW) is similar to that of the MLP and much better than that of SVR (RMSE MW) and SVR (RMSE MW). Note again that, although the GP model is similar to the MLP in error metrics, however, the GP exhibits much better sensitivity than the MLP, (ramp) (ramp). This is true not only for the MLP (which has similar errors), but also for the rest of the ML, which are long surpassed by the GP model in the aim of detecting wind ramps. For clarity, this is marked in bold in Table 3.4. This means that the GP is more efficient in predicting wind ramps (the very core of this approach) than the others, and this is the reason why the sensitivity helps supplement the information provided by the error metrics