Abstract

Programa de Doctorado en Tecnologías de la Información y las Comunicaciones

NEW HYBRID NEURO-EVOLUTIONARY ALGORITHMS FOR RENEWABLE ENERGY AND FACILITIES MANAGEMENT PROBLEMS

Tesis Doctoral presentada por

LAURA Mª  CORNEJO BUENO

Director:

DR. SANCHO SALCEDO SANZ

Alcalá de Henares, 2018

No te rindas, por favor no cedas, aunque el frío queme, aunque el miedo muerda, aunque el sol se esconda y se calle el viento, aún hay fuego en tu alma, aún hay vida en tus sueños.

Mario Benedetti

Abstract

This Ph.D. thesis deals with the optimization of several renewable energy resources development as well as the improvement of facilities management in oceanic engineering and airports, using computational hybrid methods belonging to Artificial Intelligent (AI) to this end. These problems will be summarized hereafter with the technical solutions proposed at the end of the section.

Energy is essential to our society in order to ensure a good quality of life. Nowadays, fossil fuels are the most important energy source in the planet. However they contribute to Climate Change greatly, affecting the ecosystems severely. For this reason, there is a trend to gradually use renewable resources which guarantee a sustainable development. In fact, a penetration of these resources over the 50% are expected in fifty years. Obviously, that process shall not be the same in all countries due to renewable energy resources are not uniformly distributed throughout the World. It is important to note that different regions rely on different renewable technologies, so they can contribute to regional development in a larger or lesser degree. The main drawback of renewable energies is the natural variability inherent to the resource. This means that predictions over the characteristics on which renewable energies depend are necessary, in order to know the amount of energy that will be obtained at any time.

The second topic tackled in this thesis is related to the basic parameters that influence in different marine activities and airports, whose knowledge is necessary to develop a proper facilities management in these environments. For instance, the Significant Wave Height () is a basic parameter in wave characterization, important to different problems in marine activities such as the design and management of vessels, marine structures, Wave Energy Converters (WECs), etc. On the other hand, the low-visibility events at airports, normally caused by fog events, is another fundamental issue in airport activities which can cause flight delays, diversions and cancellations or accidents in the worst cases.

Within this work, a study of the state-of-the-art Machine Learning (ML) have been performed to solve the problems associated with the topics above-mentioned, and several contributions have been proposed:

  • One of the pillars of this work is focused on the estimation of the most important parameters in the exploitation of renewable resources. For this purpose, Support-Vector Regression (SVR), Neural Network (NN) ( Multi-Layer Perceptron (MLP) and Extreme-Learning Machine (ELM)) and Gaussian process (GP) algorithms are used in several practical problems. The performance of these algorithms is discussed in every experiment carried out, and also the specific settings of the algorithms, as well as internal characteristics of the models.

  • The second contribution of this thesis is related to feature selection problems. More specifically, the use of EAs as Grouping Genetic Algorithm (GGA) or Coral Reef Optimization (CRO) hybridized with others ML approaches as classifiers and regressors. Regarding this, the GGA or CRO looks for several subsets of features important to solve the problem, and the regressor employed provides the prediction in terms of the features selected by the Genetic Algorithm (GA), reducing the computational cost with a good accuracy.

The proposed methodologies are applied to multiple problems: the prediction of , relevant for marine energy applications and marine activities, the estimation of Wind Power Ramps Events (WPREs), undesirable variations in the electric power produced by a wind farm, the prediction of global solar radiation in areas from Spain and Australia, really important in terms of solar energy, and the prediction of low-visibility events at airports. All of these practical issues are developed with the consequent previous data analysis, normally, in terms of meteorological variables.

Resumen en Castellano

Esta tesis tiene como objetivo la optimización de la explotación de recursos energéticos renovables, así como la mejora en la gestión de instalaciones en ingeniería oceánica y aeropuertos, usando métodos computacionales híbridos pertenecientes a una rama de la Inteligencia Artificial (IA), denominada aprendizaje máquina, para este fin. Estos problemas serán resumidos a continuación con las soluciones técnicas propuestas al final de la sección.

La energía es esencial en nuestra sociedad para asegurar una buena calidad de vida. Hoy en día, los combustibles fósiles constituyen la fuente energética más importante del planeta, sin embargo, estas formas de energía contribuyen al Cambio Climático en gran medida, afectando los ecosistemas severamente. Por esta razón, se tiende gradualmente al uso de fuentes de energía renovables que garanticen un desarrollo sostenible. De hecho, se prevé en 50 años, una penetración de estos recursos por encima del 50%. Obviamente, este proceso no será igual en todos los países, debido a que las fuentes de energía renovables no están uniformemente distribuidas a lo largo del mundo. El hecho más importante es que cada área cuenta con alguna de ellas, y pueden contribuir al desarrollo regional en mayor o menor medida, gracias a lo cual las fuentes de energía convencionales serán sustituidas progresivamente. Sin embargo, se observa un lento desarrollo en este sentido, y la única cuestión que cabe preguntarse es cuándo las energías renovables tendrán mayor penetración en el sistema que los actuales combustibles fósiles. Para responder a esta pregunta, una buena manera es centrarse en el principal inconveniente de este tipo de energías: la variabilidad natural inherente al recurso. Esto significa que las predicciones sobre los parámetros más importantes de los que dependen las energías renovables son necesarias para conocer la cantidad de energía que será obtenida en un momento dado.

El otro tema abordado en esta tesis está relacionado con los parámetros que influyen en diferentes actividades marinas y aeropuertos, cuyo conocimiento de su comportamiento es necesario para desarrollar una correcta gestión de las instalaciones en estos entornos. Por ejemplo, la altura significativa de las olas () es un parámetro básico en la caracterización de las olas, muy importante para el desarrollo de actividades marinas como el diseño y mantenimiento de barcos, estructuras marinas, convertidores energéticos de ola, etc. Por otro lado, la escasa visibilidad en los aeropuertos, normalmente causada por la niebla, es otro aspecto fundamental para el correcto desarrollo de actividades aeroportuarias, y que puede causar retrasos en los vuelos, desvíos y cancelaciones, o accidentes en el peor de los casos.

En este trabajo se ha realizado un análisis del estado del arte de los modelos de aprendizaje máquina que se utilizan actualmente, con el objetivo de resolver los problemas asociados a los temas tratados con anterioridad. Diferentes contribuciones han sido propuestas:

  • Uno de los pilares esenciales de este trabajo está centrado en la estimación de los parámetros más importantes en la explotación de energías renovables. Con este propósito, los algoritmos Vectores Soporte para Regresión (VSR), Redes Neuronales (RN) (Perceptrones Multicapa (MLP) y Máquinas de Aprendizaje Extremo (MAE)) y Procesos Gaussianos son utilizados en diferentes problemas prácticos. El rendimiento de estos algoritmos es analizado en cada uno de los experimentos realizados, tanto la precisión de los mismos como la especificación de las características internas.

  • Otro de los aspectos tratados está relacionado con problemas de selección de características. Concretamente, con el uso de algoritmos evolutivos como Algoritmos de Agrupación Genética (AAG) o los algoritmos de Optimización de Arrecife de Coral (OAC) hibridizados con otros métodos de aprendizaje máquina como clasificadores y regresores. En este sentido, el AAG o OAC analizan diferentes conjuntos de características para obtener aquel que resuelva el problema con la mayor precisión, y el regresor empleado proporciona la predicción en función de las características obtenidas por el Algoritmo Genético (AG), reduciendo el coste computacional con gran fiabilidad en los resultados.

La metodología mencionada es aplicada a múltiples problemas: predicción de , relevante en aplicaciones energéticas y actividades marinas, estimación de eventos puntuales como son las rampas de viento (ERV), variaciones indeseables en la potencia eléctrica producidas por un parque eólico, predicción de la radiación solar global en áreas de España y Australia, realmente importante en términos de energía solar, y la estimación de eventos de baja visibilidad en aeropuertos. Los casos prácticos citados son desarrollados con el consecuente análisis previo de la base de datos empleada, normalmente, en términos de variables meteorológicas.

Agradecimientos

Seguramente necesitaría otro libro para expresar el enorme agradecimiento que siento hacia todos vosotros. El esfuerzo y espíritu de sacrificio siempre dan su fruto, pero una cosa es segura y es la importancia de poder contar con el apoyo incondicional de las personas que tienes alrededor. Porque de una forma u otra todos aportáis vuestro granito de arena y contribuís a que hoy pueda seguir creciendo como estudiante y como persona.

En primer lugar quiero agradecer a mi Director de Tesis, Sancho Salcedo la confianza depositada en mí. Por enseñarme a ganar seguridad en uno mismo desde la mejor de las humildades, y por supuesto por enseñarme tanto y tan constante, porque sin él está claro que todo este trabajo no habría visto la luz. Gracias por todo el apoyo y por hacer de esta etapa una de las mejores vividas hasta el momento.

También quiero agradecer a Enrique Alexandre, Silvia Jiménez, José Antonio Portilla, Lucas Cuadra, José Carlos Nieto y Raquel Criado el haberme acogido como una más, y permitirme aprender tanto de ellos. Porque además de poder trabajar en lo que te entusiasma, es un gusto poder hacerlo en un ambiente tan agradable como el que conseguís crear en el laboratorio.

Y a Carlos Casanova con el que he tenido el placer de poder trabajar codo con codo en uno de los artículos de esta tesis, y cuya colaboración ha sido crucial para su publicación. Aprovecho también para agredecer a todas las personas que me he cruzado en estos años y de las que he podido tanto aprender como coloborar en numerosos trabajos.

Por supuesto a mis “mindundis” Carlos Camacho, Freddy Pinto y Adrián Aybar, mis compis de fatiga. Gracias por las innumerables comidas, cafés, charlas y quedadas; por ese intercambio de conocimiento y sobre todo por las risas difíciles de olvidar. !‘Chicos ya se va viendo la luz al final del túnel! Y a mi compañera de la planta de arriba Inma Mohino, cuya sonrisa te alegra el día.

El doctorado además me ha permitido vivir una de las mejores experiencias de mi vivida. Mi estancia de 3 meses en Australia. Allí conté con el apoyo del Profesor Ravinesh C. Deo, quien me recibió con los brazos abiertos y contribyó en mi formación. Además aprendí que se puede conocer el verdadero significado de amistad aunque 2 personas estén separadas por más de 17.000 km. Kavina Dayal gracias por convertir esta estancia en algo inolvidable; nos vemos en alguna parte del mundo.

Quería agradecer también a una persona muy especial, a un amigo que me conoce desde mucho antes de estar aquí y que me ha apoyado tanto desde dentro. Enrique García, Kike, gracias por tus visitas, por la alegría que consigues despertarme aún en los momentos que parecían no tenerla. Ha sido muy importante poder contar tan de cerca con alguien de mi familia, alguien como tú.

Y ahora sí, las personas que me han visto crecer, y que tanto han creído en mi, incluso ni cuando yo misma creía.

Quiero empezar por la persona que me lo ha dado todo, mi madre, Carmen Bueno. No se puede explicar con palabras todo lo que te debo. Mil gracias por estar ahí al pie del cañón y sacar fuerzas de donde no las hay para mostrar siempre una sonrisa. En especial quiero destacar tu enorme valor y la fuerza que has demostrado siempre, sobre todo frente a la adversidad de este último año. Eres toda una inspiración y verte me hace sentir que puedo ser capaz de cualquier cosa. Eres mi luz.

A mi padre, Juan Carlos Cornejo, que no ha dejado de trabajar ni un solo día para que hoy haya podido llegar hasta aquí. Gracias por todo tu esfuerzo y voluntad, y por formar parte de lo que somos mi hermana y yo.

Mi pequeña hermanita, Sara Cornejo, que es muy grande. No creo que haya alguien que pueda conocerme mejor. Siempre estás pendiente de lo que necesito en cada momento, cuando la hermana mayor soy yo. Siempre sabes qué decir, y tus consejos nunca pueden ser más acertados. No podría imaginarme una vida sin tí, porque no habría una sin una de las partes. Gracias por tu condición humana y por hacer que no me sienta sola por muy lejos que estemos la una de la otra. Tu fuerza también hace que hoy pueda decir, !‘he llegado!, !‘estoy aquí!.

A David Doñoro, mi pilar, mi compañero de viaje en esta aventura. Son 8 años los que llevo a tu lado y consigues hacerme sentir como si estuviéramos empezando cada día. Es reconfortante poder llegar a “mi sitio” y sentirme en casa. Gracias por creer en mi, por ser fuerte cuando lo necesito, y por no dejarme caer. Juntos podemos con lo que nos echen.

Por supuesto a mi yayi, Teresa Montes, una luchadora innata, un ejemplo de vida. Una persona que es capaz de transmitir AMOR en el más profundo sentido de la palabra. Te debemos todo, y no creo que podamos estar más orgullosos de tener una madre, esposa, abuela y bisabuela como TÚ. Gracias por no decaer y seguir a nuestro lado con tanto tesón.

Y a mi otra abuela, Magdalena Macías, cuya pasión por los estudios nos animó siempre a seguir luchando por nuestro futuro. Gracias por ser igualmente una luchadora de esta vida, y mantenerte entera pese a todo lo vivido. Eres un ejemplo de constancia.

A mis tíos Francisco José López, Teresa Bueno e Isabel Bueno, gracias por hacer que pueda contar con vosotros y estar a mi lado en este camino. Destacar las comiditas de la tita Beli, que tanto ayudan cuando no hay tiempo ni de cocinar, los consejos de la tita Mari, y las provechas conversaciones del tito Francisco.

No puedo olvidarme de mis primos, José Gabriel del Prado, Eduardo González, Israel González y Jesús del Prado, que no son primos sino hermanos. Gracias por toda esta vida de cariño y diversión, sois parte esencial de este camino. Y como no Mariví, Ana y Andrea que se han convertido en las mejores primas inesperadas que se puede tener.

Y siguiendo la línea sucesoria es turno de mis sobrinitos. Isabel, Fátima, Gabriel y Alejandro, las personitas más pequeñas y que más pueden llenar de luz un día gris.

Como hay una que sí sabe leer, quería dedicarle unas palabras, pues creo que no puede imaginarse como me cambió la vida. Isabel eres mi motor, el empuje mañanero que me anima cada día. Ni loca me perdería esas conversaciones de niña de 7 años a adulto en las que a veces dudo de quién es el adulto. Porque aunque digas “eres la mejor tita del mundo”, y reconozco que me derrito cada vez que te escucho, eres tú la única capaz de hacer olvidar todo lo malo de alrededor, y encima lo haces sin darte cuenta. No he podido tener más suerte contigo, y solo quisiera poder transmitirte la mitad de lo que tú me das. No te rindas nunca y lucha, lucha porque yo siempre estaré a tu lado apoyándote como tú (siendo tan pequeña) has hecho conmigo. Recuerda, nada es imposible. Te quiero.

Y no puedo terminar esta parte sin agradecer enormemente a los que por desgracia no han podido verme acabar. Mi abuelo, José Luis Cayuela, y Manuel García. Me quedo con todo lo que me habéis enseñado, que es mucho, echándoos de menos cada día, pero agradeciendo enormemente el haber coincidido en esta vida, y que hayáis formado parte de mi familia. Padri nadie podrá llamarme “chata” de la forma en que tú lo hacías, y no es comprable a nada la forma burlona de llamar a “la Lauri” que tú Manuel tenías. Gracias por vuestro AMOR. Os quiero.

A mi otra familia, Cati, Pablo, Estefanía, Julián, Toñi, Jesús Ángel, Juan Carlos, Abuelos, Manolo y desde el más profundo cariño Grego. Por hacerme sentir parte de vuestras vidas y contribuir con vuestro cariño y valores desde que comencé esta etapa tan importante. Gracias por estar a mi lado y acogerme como lo hicísteis.

Y no podía olvidarme de vosotros, mis amigos y compañeros desde que empezamos la carrera. Casi 10 años ya y tan unidos como al principio. Casillas, Pascu, Dan, Jenny, Gallo, Guille, Sara, Samu, Paloma, Jesús, Manu, Mar, Pastor, Alvarito, Víctor, Jesica y Susana. !‘Gracias! Porque sabéis lo importante que sois, y habéis demostrado estar en todo momento. !‘Qué aburridos habrían sido los días sin vosotros! Espero que mantegamos esta bonita amistad por muchos años más.

Por último quiero dar las gracias a “las niñas del cole” Marta, Rosana y Lidia, con las que he compartido mi niñez y adolescencia y con las que sigo creciendo y afrontando etapas. Después de casi 20 años es íncreible poder contar con amigas como vosotras. Y a Rocío, siempre la vecinita. Por todas las tardes de estudio que nos ha amenizado con su alegría, y ser todo un apoyo por muchos días que pasen sin que nos veamos.

Siento si me dejo a alguien, pero esto es gracias a TODOS, a los que aparecéis y a los que no he puesto. Porque a lo largo de los años se conocen muchas personas que dejan huella y forman parte de lo que ahora somos. Quién sabe cuándo volveré a escribir un libro, al menos en éste puedo reflejar el esfuerzo de muchos años y el fruto obtenido, que también es vuestro.

A todos, OS QUIERO.

Contents
List of Figures
List of Tables

List of Acronyms

AI
Artificial Intelligent
ARMA
Autoregressive-Moving-Average
ANN
Artificial Neural Network
BO
Bayesian Optimization
CI
Computational Intelligence
CNN
Convolutional Neural Network
CRO
Coral Reef Optimization
DFT
Discrete Fourier Transform
EA
Evolutionary Algorithm
EC
Evolutionary Computation
ELM
Extreme-Learning Machine
EI
Expected Improvement
EV
Electric Vehicles
FC
Fuzzy Computation
FFT
Fast Fourier Transform
FL
Fuzzy Logic
FS
Feature Selection
GA
Genetic Algorithm
GGA
Grouping Genetic Algorithm
GP
Gaussian process
GS
Grid Search
MAE
Mean Absolute Error
ML
Machine Learning
MLP
Multi-Layer Perceptron
MSE
Mean Squared Error
NC
Neural Computation
NN
Neural Network
RMSE
Root Mean Squared Error
SAR
Synthetic Aperture Radar
SC
Soft-Computing
SM
Standard Method
SVM
Support Vector Machine
SVR
Support-Vector Regression
Significant Wave Height
V2G
Vehicle-to-Grid
Wave Energy Flux
WECs
Wave Energy Converters
WPF
Wind Power Forecasting
WPREs
Wind Power Ramps Events

Part I Motivation and state-of-the-art

Chapter 1 Introduction

1.1 Motivation

The challenges of renewable energies in the near future, as well as the associated facilities, will require new computational tools for the optimization and exploitation of the available resources. In a World where the climate change is a recognized fact, it is necessary a re-evaluation of the energy use. For this reason, this work is focused on renewable energy sources, with almost zero emissions of both air pollutants and greenhouse gases. Currently, renewable energy sources supply 19% of the total world energy demand [REN21-2017], but it is expected an increasing in the near future. In consequence, a sustainable development must be guaranteed, defined by the World Commission on Environment and Development as “development that meets the needs of the present without compromising the ability of future generations to meet their own needs”. The main goal then is to conciliate energy production, that satisfies social welfare, and the environmental protection, achieving economic growth.

The technology is the best way to meet the objectives proposed. There are many renewable energy technologies but most of them are still at an early stage of development and not technically mature. The aim of this work is to contribute to the progress in this field by means of AI.

AI is a term that indicates in its broadest sense the ability of a machine to perform the human learning. Specifically, it is the part of the computer science tasked with the design of intelligent computer systems. This kind of intelligence can be associated with human behavior, i.e., understanding, language, learning [Kalogirou2006] and whose skills can be applied in diverse applications in forecasting, patter recognition, optimization and many more. That is possible because AI covers different areas like Neural Computation (NC), Evolutionary Computation (EC) and Fuzzy Computation (FC), among others, that can be used or hybridized to solve several problems in our society.

Some of these algorithms are used, in this work, in the estimation of really important parameters in renewable energy area, taking into account not only the attainment of energy but also how these parameters can affect in determined tasks of facilities management. In this regard, facilities management are related with the necessary infrastructures in renewable energy environments and another fields where meteorological variables affect in the same way, as the study developed in airports to estimate the visibility in runways which will be explained in depth in Part III.

Two fields compose the core of this Ph.D Thesis: ML regression algorithms and evolutionary optimization. Pattern recognition is a branch of AI focused on systems that are able to associate multidimensional data to labels. Using this method it will be possible to develop others systems based on the available data to obtain predictions and classifications in many fields in Engineering, Sciences, Economy, etc. The second pillar of this thesis is the use of the EC approaches to solve features selection problems and thus optimize the accuracy in the future regressions.

In the next sections it will be provided a more detailed description of the ML techniques applied in this work as well as Evolutionary Algorithm (EA), providing a review of the state-of-the-art in Soft-Computing (SC) techniques.

1.2 State of the art

This section presents a description of the state-of-the-art in the technological fields addressed in this thesis. Figure 1.1 shows a scheme of different areas of SC. SC is an essential part of AI and many of its methods can be classified in the field of Knowledge called “Natural Computing”. The algorithms that can be found in this category are inspired by the way Nature solves complex problems. In this regard, EC is inspired in the theory of evolution or ANNs find their behaviour in human brain. Because of the variety of techniques used, the structure of this section has been chosen to properly cover the areas included in AI that come in handy in this work.

Figure 1.1: Structure of SC, including NC, EC and FC.
1.2.1 Neural Computation-based Approaches

NC is the part of SC that includes algorithms inspired on how the human brain learns. They have been mainly used in classification and regression problems. In the next points four of the most used NC approaches will be described: Feed-forward NNs (MLPs and ELMs), GPs for Regression and SVR algorithms.

Multi-layer perceptrons

A MLP is a particular kind of Artificial Neural Network (ANN) that is massively parallel. It is considered a distributed information-processing system, which has been successfully applied in modelling a large variety of nonlinear problems [Haykin1998, Bishop1995]. The MLP consists of an input layer, a number of hidden layers, and an output layer, all of which are basically composed of a number of special processing units called neurons, as Figure 1.2 shows. Just as important as the processing units themselves is their connectivity, whereby the neurons within a given layer are connected to those of other layers by means of weighted links, whose values are related to the ability of the MLP to learn and generalize from a sufficiently long number of examples. Such a learning process demands a proper database containing a variety of input examples or patterns with the corresponding known outputs. The adequate values of the weights minimize the error between the output generated by the MLP (when fed with input patterns in the database), and the corresponding expected output in the database. The number of neurons in the hidden layer is a parameter to be optimized when using this type of neural network [Haykin1998, Bishop1995].

Figure 1.2: Artificial neural network.

The input data for the MLP consists of a number of samples arranged as input vectors, x=. Once a MLP has been properly trained, validated and tested using an input vector different from those contained in the database, it is able to generate a proper output . The relationship between the output and the input signals of a neuron is

(1.1)

where is the output signal, for are the input signals, is the weight associated with the -th input, and is a threshold [Haykin1998, Bishop1995]. The transfer function is usually considered as the logistic function,

(1.2)

The process to obtain an accuracy output is related with the training procedure as it was mentioned before. During the training process, the error between the estimated output and its corresponding real value in the database will determine what degree the weights in the network should be adjusted, and thanks to all neurons in the network are interconnected (feed-forward NN) MLP makes easy to get this purpose. Hence, The objective of training is to find the combination of weights which result in the smallest error. There are many algorithms that can be used to train a MLP. One possible technique is the backpropagation training algorithm which uses the procedure known as gradient descent to try to locate the global minimum of the error [Gardner1998]. Another approach is the well-known Levenberg-Marquardt algorithm which is often applied to train the MLP [Hagan1994].

Extreme Learning Machine

An ELM [Huang2015, Huang2006] is a novel and fast learning method based on the structure of MLPs, similar to the one shown in Figure 1.2. In addition, the ELM approach presents a novel way of training feed-forward NN. The most significant characteristic of the ELM training is that it is carried out just by randomly setting the network weights, and then obtaining a pseudo-inverse of the hidden-layer output matrix. The advantages of this technique are its simplicity, which makes the training algorithm extremely fast, and also its outstanding performance when compared to avant-garde learning methods, usually better than other established approaches such as classical MLPs or SVRs.

Moreover, the universal approximation capability of the ELM network, as well as its classification capability, have been already proven [Huang2012].

The ELM algorithm can be summarized as follows: given a training set an activation function , which a sigmoidal function is usually used, and number of hidden nodes (),

  1. Randomly assign inputs weights and bias , .

  2. Calculate the hidden layer output matrix , defined as

    (1.3)
  3. Calculate the output weight vector as

    (1.4)

    where stands for the Moore-Penrose inverse of matrix [Huang2006], and is the training output vector, .

Note that the number of hidden nodes () is a free parameter of the ELM training, and must be estimated for obtaining good results. Usually, scanning a range of values is the best solution.

Gaussian Processes for Regression

GPs for regression is a generic supervised-learning method primarily designed for solving regression problems, the advantages of which include the predictive interpolation of the observations. Here, the prediction is probabilistic (Gaussian), so that one computes the empirical confidence intervals and exceeded probabilities to be used in refitting the prediction in some region of interest. Moreover, different linear-regression and correlation models may be specified. Here a short description of the most important characteristics of the GP approach is given, for which the interested reader is referred to the more exhaustive reviews of [Lázaro2012] and [Rasmussen2006] for further information.

Given a set of -dimensional inputs and their corresponding scalar outputs , for the dataset , the regression task obtains the predictive distribution for the corresponding observation based on , given a new input .

The GP model assumes that the observations can be modelled as some noiseless latent function of the inputs in addition to an independent noise, , and then specifies a zero-mean GP for both the latent function and the noise , where is a covariance function, and is a hyper-parameter that specifies the noise power.

The covariance function specifies the degree of coupling between and , and encodes the properties of the GP, such as the power level and smoothness. One of the best-known covariance functions is the anisotropic-squared exponential, which has the form of an unnormalized Gaussian function, , and depends on the signal power and length scales , where is a diagonal matrix containing one length scale per input dimension. Each length scale controls the degree to which the correlation between outputs decay as the separation along the corresponding input dimension grows. All kernel parameters are collectively referred as .

The joint distribution of the available observations (collected in ) and some unknown outputs form a multivariate Gaussian distribution, with parameters specified by the covariance function

(1.5)

where , and . Here, is used to denote the identity matrix of size . The notation refers to the entry at row , column of . Likewise, is used to reference the -th element of vector .

From (1.5) and the conditioning on the observed training outputs, the predictive distribution is obtained as

(1.6)

which is computed times, due to the inversion of the matrix .

The hyper-parameters are typically selected by maximizing the marginal likelihood (also called “evidence”) of the observations, which is

(1.7)

If analytical derivatives of (1.7) are available, optimization is carried out using gradient methods, with each gradient computed times. GPs regression algorithms can typically handle a few thousand data points on a desktop computer.

Support Vector Regression

SVR [Smola2004] is one of the state-of-the-art algorithms for regression and function approximation. The SVR approach takes into account the error approximation to the data and also the generalization of the model, i.e. its capability to improve the prediction of the model when a new dataset is evaluated by it. Although there are several versions of the SVR, the classical model, -SVR, described in detail in [Smola2004] and used in a large number of application in Science and Engineering [Salcedo2014b], is considered in this work.

The -SVR method for regression consists of, given a set of training vectors , where stands for a vector of predictive variables, and is the target associated to the input, training a model of the form

(1.8)

where stands for an estimation of , in such a way that a risk function is minimized. This risk function can be written as:

(1.9)

where the norm of controls the smoothness of the model, is a function of projection of the input space to the feature space, is a parameter of bias, and is the loss function selected. In this thesis, the L1-SVRr is used (L1 support vector regression), characterized by an -insensitive loss function [Smola2004]:

(1.10)

In order to train this model, it is necessary to solve the following optimization problem [Smola2004]:

(1.11)

subject to

(1.12)
(1.13)
(1.14)

Figure 1.3 shows and example of the final solution for a given input variables. The dual form of this optimization problem is usually obtained through the minimization of the Lagrange function, constructed from the objective function and the problem constraints. In this case, the dual form of the optimization problem is the following:

Figure 1.3: Illustration of the SVR model. Samples in the original input space are first mapped to a Reproducing Kernel Hilbert Space, where a linear regression is performed. All samples outside a fixed tube of size are penalized, and are support vectors (double-circled symbols).
(1.15)

subject to

(1.16)
(1.17)

In addition to these constraints, the Karush-Kuhn-Tucker conditions must be fulfilled, and also the bias variable, , must be obtained. The interested reader can consult [Smola2004] for reference. In the dual formulation of the problem the function is the kernel matrix, which is formed by the evaluation of a kernel function, equivalent to the dot product . A usual election for this kernel function is a Gaussian function, as follows:

(1.18)

The final form of function depends on the Lagrange multipliers , as follows:

(1.19)

So finally, the estimation of the target under study will be carried out using the following expression:

(1.20)

In this way it is possible to obtain a SVR model by means of the training of a quadratic problem for a given hyper-parameters , and . The estimation of these SVR hyper-parameters is a process usually carried out before the training of the algorithm. There are different methods to obtain , and , but the most common approach consists of a Grid Search (GS) procedure. GS exhaustively considers all parameters combinations from a grid of possible pre-defined values. The quality of the SVR with these hyper-parameters’ values is tested on a reduced validation set of data from the problem at hand. More information about GS and alternative techniques for SVR hyper-parameters estimation can be found in [Smola2004]. A variant of the GS approach that includes lower and upper bounds to limit the tested values of the hyper-parameters can be used. This method was proposed in [Ortiz2009], and it is able to considerably reduce the time for hyper-parameters estimation with GS, without affecting the quality of the SVR.

1.2.2 Evolutionary Computation-based Algorithms

EC algorithms are used for solving continuous optimization challenges, working in discrete and search spaces. They are used also in features selection to improve the performance of the predictions in regression problems. All genetic and EAs are based on the evolution of the population of candidate solutions by applying a series of evolutionary operators. Part of this PhD. Thesis is based on the application of this kind of techniques, hence the explanation of different approaches will be carried out in the following points.

The Grouping Genetic Algorithm

There are many potential benefits for applying Feature Selection (FS) in prediction problems: facilitating data visualization and data understanding, reducing the measurement and storage requirements, reducing training and utilization times or defying the curse of dimensionality to improve prediction performance, to mention some of them. Previous literature on this issue mainly focus on constructing and selecting subsets of features that are useful to build a good predictor. This process has been tackled before with EC [Salcedo2002, Salcedo2014a]. The GGA is a class of evolutionary algorithm especially modified to tackle grouping problems, i.e., problems in which a number of items must be assigned to a set of predefined groups (subsets of features, in the case of this work). It was first proposed by Falkenauer [Falkenauer1992, Falkenauer1998], who realized that traditional GAs had difficulties when they were applied to grouping problems (mainly, the standard binary encoding increases the space search size in this kind of problem). The GGA has shown very good performance on different applications and problems [Agustín2008, De-Lit2000]. In the GGA, the encoding, crossover and mutation operators of traditional GAs are modified to obtain a compact algorithm with very good performance in grouping problems.

Problem encoding

The GGA initially proposed by Falkenauer is a variable-length GA. The encoding is carried out by separating each individual in the algorithm into two parts: the first one is an assignment part that associates each item to a given group. The second one is a group part, that defines which groups must be taken into account for the individual. In problems where the number of groups is not previously defined, it is easy to see why this is a variable-length algorithm: the group part varies from one individual to another. In the implementation of the GGA for FS, an individual has the form . An example of an individual in the proposed GGA for a FS problem, with 20 features and 4 groups, is the following:

1 1 2 3 1 4 1 4 3 4 4 1 2 4 4 2 3 1 3 2 1 2 3 4

where the group 1 includes features , group 2 features , group 3 features {4,9,17,19} and finally group 4 includes features .

Genetic operators

Tournament-based selection mechanism is usually used, similar to the one described in [Yao1999]. It has been shown to be one of the most effective selection operators, avoiding super-individuals and performing a excellent exploration of the search space. Regarding the crossover operator, it is implemented in the GGA as a modified version of the initially proposed by Falkenauer [Falkenauer1992, Falkenauer1998]. The process to apply this operator follows the process outlined in Figure 1.4:


Figure 1.4: Outline of the grouping crossover implemented in the proposed example of GGA.
  • Choose two parents from the current population, at random.

  • Randomly select two points for the crossover, from the “Groups” part of parent 1, then, all the groups between the two cross-points are selected. In the example of Figure 1.4 the two crossover points are and . Note that, in this case the items of parent1 belonging to group and are 1, 2, 4, 5, and 6.

  • Insert the selected section of the “Groups” part into the second parent. After the insertion in the example of Figure 1.4, the assignment of the nodes 1, 2, 4, 5 and 6 of the offspring individual will be those of parent 1, while the rest of the nodes’ assignment are those of parent 2. The “Groups” part of the offspring individual is that of parent 2 plus the selected section of parent 1 (8 groups in total, in this case).

  • Modify the “Groups” part of the offspring individual with their corresponding number. In the example, = 1   2   3   4   5   6   1   2 is modified into = 1   2   3   4   5   6   7   8. Modify also the assignment part accordingly.

  • Remove any empty groups in the offspring individual. In the example considered, it is found that groups 1, 2, 3, and 6 are empty, these groups’ identification number are eliminated and the rest are rearranged. The final offspring is then obtained.

Regarding mutation operator, note that standard mutation usually calls for an alteration of a small percentage of randomly selected parts of the individuals. This type of mutation may be too disruptive in the case of a grouping problem. In this case, a swapping mutation in which two items are interchanged (swapping this way the assignment of features to different groups), is taken into account. This procedure is carried out with a very low probability to avoid increasing of the random search in the process.

The Coral Reef Optimization

The CRO is a novel meta-heuristic approach for optimization, recently proposed in [Salcedo2014c], which is based on simulating the corals’ reproduction and coral reefs’ formation processes. Basically, the CRO is based on the artificial modeling of a coral reef , consisting of a grid. It is assumed that each square (i,j) of is able to allocate a coral . Note that each of such corals represents a solution to a given optimization problem, for which it is encoded as a string of numbers, spanning a given alphabet . The CRO algorithm is first initialized at random by assigning some squares in to be occupied by corals (i.e. solutions to the problem) and some other squares in the grid to be empty, i.e. holes in the reef where new corals can freely settle and grow in the future. The rate between free/occupied squares in at the beginning of the algorithm is denoted as and referred to as initial occupation factor. Each coral is labeled with an associated health function that corresponds to the problem’s objective function. The CRO is based on the fact that the reef will evolve and develop as long as healthier or stronger corals (which represent better solutions to the problem at hand) survive, while less healthy corals perish.

After the reef initialization described above, the phase of reef formation is artificially simulated. This phase consists of K iterations: at each of such iterations the corals’ reproduction in the reef is emulated by applying different operators and processes as described in Algorithm 1: a modeling of corals’ sexual reproduction (broadcast spawning and brooding).

After the reproduction stage, the set of formed larvae (namely, newly produced solutions to the problem) attempts to find a place on the reef to develop and further reproduce. This deployment may occur in a free space inside the reef (hole), or in an occupied location, by fighting against the coral currently settled in that place. If larvae are not successful in locating a place to settle after a number of attempts, they are considered as preyed by animals in the reef. The coral builds a new reef layer in every iteration.

0:  Valid values for the parameters controlling the CRO algorithm
0:  A single feasible individual with optimal value of its fitness
1:  Initialize the algorithm
2:  for each iteration of the simulation do
3:     Update values of influential variables: predation probability, etc.
4:     Sexual reproduction processes (broadcast spawning and brooding)
5:     Settlement of new corals
6:     Predation process
7:     Evaluate the new population in the coral reef
8:  end for
9:  Return the best individual (final solution) from the reef
Algorithm 1 Pseudo-code for the CRO algorithm

The specific definition of the different operators that form the classical CRO algorithm is detailed here:

  1. Sexual reproduction: The CRO model implements two different kinds of sexual reproduction: external and internal.

    1. External sexual reproduction or broadcast spawning: the corals eject their gametes to the water, from which male-female couples meet and combine together to produce a new larva by sexual crossover. In Nature, some species are able to combine their gametes to generate mixed polyps even though they are different from each other. In the CRO algorithm, external sexual reproduction is applied to a usually high fraction of the corals. The couple selection can be done uniformly at random or by resorting to any fitness proportionate selection approach (e.g. roulette wheel). In the original version of the CRO, standard crossover (one point or two-points) are applied in the broadcast spawning process.

    2. Internal sexual reproduction or brooding: CRO applies this method to a fraction of the corals in the reef. The brooding process consists of the formation of a coral larva by means of a random mutation of the brooding-reproductive coral (self-fertilization considering hermaphrodite corals). The produced larvae is then released out to the water in a similar fashion than that of the larvae generated through broadcast spawning.

  2. Larvae settlement: once all larvae are formed at iteration through reproduction, they try to settle down and grow in the reef. Each larva will randomly attempt at setting in a square of the reef. If the location is empty (free space in the reef), the coral grows therein no matter the value of its health function. By contrast, if another coral is already occupying the square at hand, the new larva will set only if its health function is better than the fitness of the existing coral. A number of attempts for a larva to set in the reef is defined: after unsuccessful tries, it will not survive to following iteration.

  3. Depredation: corals may die during the reef formation phase of the reef. At the end of each iteration, a small number of corals can be preyed, thus liberating space in the reef for the next iteration. The depredation operator is applied under a very small probability , and exclusively to a fraction of the worse health corals.

1.3 Structure of the thesis

The rest of this thesis is organized in two technical parts:

  1. First, proposed contributions with numerical results in renewable energy problems is structured in two chapters: Ocean wave features prediction, and WPREs prediction.

  2. The next part, proposed contributions with numerical results in facilities management is divided in two other chapters: Accurate estimation of with SVR and marine radar images, and efficient prediction of low-visibility events at airports.

To conclude, some final remarks and future research lines are summarize in the last part of the document, with the list of publications shown in a final Appendix section.

Part II Proposed contributions with numerical results in renewable energy problems

Chapter 2 Ocean wave features prediction

2.1 Introduction

The exploitation of marine energy resources is currently a hot topic in renewable energy research, since they have shown a clear potential for sustainable growth [Defne2009, García2014, Lenee2011, López2013, Rusu2009, Rusu2012, Gonçalves2014]: marine energy resources do not generate CO, are potentially able to convert part of the huge energy of oceans into electricity [Arinaga2012, Esteban2012], and reduce oil imports, a crucial geo-economical issue. However, in spite of this potential, the use of marine energy sources is nowadays still minor at global level. In spite of this, wave energy plays a key role for sustainable development in several offshore islands because it provides not only technical and economical benefits (to satisfy the demands of clean electricity) but also without significant environmental impact, a key concern in offshore islands, committed to the protection of ecological systems [Fadaeenejad2014]. Some interesting reviews of the most important issues involved in the generation of electricity from oceans (including converters, their related economical aspects, and the potential of a number of ocean regions to be exploited worldwide) can be found in [Bahaj2011, Chong2013, Kim2012, Hammar2012, Cuadra2016].

There are different technologies that can be considered within marine energy resources, including ocean wave, tidal and ocean thermal. This work is focused on wave energy, that uses WECs to convert ocean energy into electricity [Falcão2010, Cuadra2016]. WECs transform the kinetic energy of wind-generated waves into electricity by means of either the vertical oscillation of waves or the linear motion of waves, and exhibit some important advantages when compared to alternatives based on tidal converters. However, waves are more difficult to characterize than tides, because of their stochastic nature. As a consequence of this complexity, both the design, deployment, and control of WECs [Hong2014, Richter2013] become key topics that require a proper characterization and prediction of waves [Larsén2015, Reikard2015, Wimmer2006]. Maybe, the two most important wave parameters in this regard to characterize wave energy is the and the Wave Energy Flux (), in which prediction this chapter is focused on.

As mentioned, waves’ stochastic nature makes very difficult the prediction of wave energy resource, so the research work on this topic has been intense in the last few years. Focusing on ML approaches, one of the first algorithms proposed to predict is due to Deo et al. [Deo1998], who use ANN to obtain an accurate prediction of . Improvements on this prediction system were presented in a more recent work [Agrawal2004]. NN have also been applied to other problems of and prediction, such as [Tsai2002], where and are predicted from observed wave records using time series NN, [Castro2014], where a neural network is applied to estimate the wave energy resource in the northern coast of Spain, or [Zanaganeh2009], where a hybrid GA-adaptive network-based fuzzy inference system model was developed to forecast and the peak spectral period at Lake Michigan. Alternative proposals based on different approaches have been recently proposed like in [Mahjoobi2008], where different SC techniques are tested for prediction, [Mahjoobi2009] where a SVR methodology is considered, [Fernández2015] where different classifiers have been applied to analyze and predict and ranges in buoys for marine energy applications, [Nitsure2012], that propose the use of genetic programming for reconstruction problems or [Özger2011], where Fuzzy Logic (FL)-based approaches were introduced for prediction problems.

In spite of this huge work dealing with ML algorithms in and prediction, there are not previous studies focussed on analyzing what are the best predictive variables to obtain an accurate prediction of these important parameters from neighbour buoys data. This problem is usually known in the ML community as FS [Weston2000], and it is an important task in supervised classification and regression problems. The reason for this is that irrelevant features, used as part of a training procedure in a classification or regression machine, can unnecessarily increase the cost and running time of a prediction system, as well as degrade its generalization performance [Blum1997, Salcedo2002]. In this thesis a novel hybrid GGA–ELM for accurate prediction of and values is proposed. The GGA is a recently proposed algorithm especially modified to tackle grouping problems [Falkenauer1992, Falkenauer1998]. In this case it is focussed on obtaining the best set of features (predictive variables) for the regressor machine (ELM). It will be shown how the GGA is able to produce different sets of good predictive variables for this problem, and how the ELM is able to obtain excellent or prediction from them. An experimental analysis of the proposed hybrid GGA-ELM approach in a real case of and prediction in buoys at the Western coast of the USA will be carried out. The application of alternative regression techniques such as SVR have been also analyzed in these experiments. Moreover, because of this hybrid prediction system has a number of parameters that may affect its final performance and they need to be previously specified by the practitioner, an automatic fine tuning of the prediction system’s parameters is added to the study. In this case, the parameters of GGA-ELM approach include the probability of mutation in the GGA or the number of neurons in the ELM hidden layer, among others. It is proposed then to use a Bayesian Optimization (BO) approach to automatically optimize the parameters of the whole prediction system (GGA-ELM), with the aim of improving its performance in wave energy prediction problems. BO has been shown to obtain good results in the task of obtaining good parameter values for prediction systems [Snoek2012].

The rest of the chapter is structured in the following parts: Section 2.2 where the calculation of and is done, Section 2.3 that presents the prediction system considered, Section 2.4 which addresses the explanation of the BO method, Section 2.5 that summarizes the experiments and results obtained and Section 2.6 that completes the study with some final remarks.

2.2 Wave energy resource: calculation of and

In the evaluation of wave energy deployment systems such as WECs or WECs arrays, it is essential to previously characterize as accurately as possible the amount of wave energy available in a particular location, given by parameters such as and . In order to obtain these parameters, note that the wave energy resource in a region is caused by both local and far winds blowing over the ocean surface, which transports the wave energy. Focusing thus the attention on the water surface, and within the framework of the linear wave theory, the vertical wave elevation, , at a point on the sea surface at time can be assumed as a superposition of different monochromatic wave components [Nieto2013, Goda2010]. This model is appropriate when the free wave components do not vary appreciably in space and time (that is, statistical temporal stationarity and spatial homogeneity can be assumed [Goda2010]).

In this model, the concept of “sea state” refers to the sea area and the time interval in which the statistical and spectral characteristics of the wave do not change considerably (statistical temporal stationarity and spatial homogeneity). The total energy of a sea state is the combined contribution of all energies from different sources. The “wind sea” occurs when the waves are caused by the energy transferred between the local wind and the free surface of the sea. The “swell” is the situation in which the waves have been generated by winds blowing on another far area (for instance, by storms), and propagate towards the region of observation. Usually, sea states are the composition of these two pure states, forming multi-modal or mixed seas.

In a given sea state, the wave elevation with respect to the mean ocean level can be assumed as a zero-mean Gaussian stochastic process, with statistical symmetry between wave maxima and minima. A buoy deployed at point can take samples of this process, , generating thus a time series of empirical vertical wave elevations. The Discrete Fourier Transform (DFT) of this sequence, using the Fast Fourier Transform (FFT) algorithm, allows for estimating the spectral density . Its spectral moments of order can be computed as follows:

(2.1)

The is a first indicator of the amount of wave energy available in a given area. , or power density per meter of wave crest [Cahill2013] can be computed as

(2.2)

where is the sea water density (1025 kg/m), the acceleration due to gravity, is the spectral estimation of the , and is an estimation of the mean wave period, normally known as the period of energy, which is used in the design of turbines for wave energy conversion. Expression (2.2) (with in meters and in seconds) leads to kW/m, and helps engineers estimate the amount of wave energy available when planning the deployment of WECs at a given location.

2.3 The hybrid prediction system considered

The prediction system is a hybrid wrapper approach, formed by the GGA (explained in depth in Section 1.2.2) for FS and the ELM to carry out the final prediction of or from a set of input data. The regressor chosen must be as accurate as possible, and also very fast in its training process, in order to avoid high computational burden for the complete algorithm. This is the main reason why the ELM is selected for the fitness function as well, and whose explanation is carried out in detail in Section 1.2.1. Since the ELM is hybridized with the GGA, there are different ways of calculating the final fitness associated with each individual. In this case the following fitness function is considered, that uses a measure of the Root Mean Squared Error (RMSE) of the prediction for the best group of features in the GGA:

(2.3)

where stands for the or measured for sample , and stands for the or estimated by the ELM in the group of the individual with less error (best group of features), for sample . Note that stands for the number of training samples.

2.4 Bayesian optimization of the prediction system

Every ML algorithm or prediction system has its own set of parameters that must be adjusted to obtain an optimal performance. An example is a deep neural network in which one has to specify parameters such as the learning rate, the number of layers, the number of neurons in each layer, etc. [LeCun2015]. Another example is stochastic gradient boosting in which one has to choose the number of terminal nodes in the ensemble trees, the number of trees, the regularization parameter, etc. [Friedman2002]. In the particular setting in this study, in an ELM the number of units in the hidden layer has to be specified before training; and in the GA described in Section 1.2.2, the probability of mutation and the number of epochs must be known initially.

Changing the parameter values of a prediction system may have a strong impact in its performance. Parameter tuning is hence defined as the problem of finding the optimal parameter values of a prediction system on the problem considered. This task has traditionally been addressed by human experts, which often use prior knowledge to specify parameter values that are expected to perform well. However, such an approach can suffer from human bias. An alternative solution is to consider a grid or uniform search in the space of parameters to look for values that result in a good performance on a validation set. These methods, however, suffer when the dimensionality of the parameter space is very high [Bergstra2012], requiring a large number of parameter evaluations.

BO has emerged as practical tool for parameter selection in prediction systems. These methods provide an efficient alternative to a grid or uniform search of the parameter space [Snoek2012]. Assume that the surface defined by the error of a prediction system that depends on some parameters is smooth. In that case, a search through the parameter space according to a criterion that exploits this smoothness property and avoids exhaustive exploration can be done. More precisely, BO methods are very useful for optimizing black-box objective functions that lack an analytical expression (which means no gradient information), are very expensive to evaluate, and in which the evaluations are potentially noisy [Mockus1978, Brochu2010, Shahriari2016]. The performance of a prediction system on a randomly chosen validation set, when seen as a function of the chosen parameters, has all these characteristics.

Consider a black-box objective with noisy evaluations of the form , with some noise term. BO methods are very successful at reducing the number of evaluations of the objective function needed to solve the optimization problem. At each iteration of the optimization process, these methods fit a probabilistic model, typically a GP to the observations of objective function collected so far. The uncertainty about the objective function provided by the GP is then used to generate an acquisition function , whose value at each input location indicates the expected utility of evaluating there. The next point at which to evaluate the objective is the one that maximizes . Importantly, only depends on the probabilistic model and can hence be evaluated with very little cost. Thus, this function can be maximized very quickly using standard optimization techniques. This process is repeated until enough data about the objective has been collected. When this is the case, the GP predictive mean for can be optimized to find the solution of the optimization problem. Algorithm 2 shows the details of such a process.

for  do
       1: Find the next point to evaluate by optimizing the acquisition function: . 2: Evaluate the black-box objective at : . 3: Augment the observed data . 4: Update the GP model using .
end for
Result: Optimize the mean of the GP to find the solution.
Algorithm 2 BO of a black-box objective function.

The key for BO success is that evaluating the acquisition function is very cheap compared to the evaluation of the actual objective , which in this case requires re-training the prediction system. This is so because the acquisition function only depends on the GP predictive distribution for at a candidate point . Let the observed data until step of the algorithm be . The GP predictive distribution for is given by a Gaussian distribution characterized by a mean and a variance . These values are:

(2.4)
(2.5)

where is a vector with the objective values observed so far; is a vector with the prior covariances between and each ; is a matrix with the prior covariances among each , for ; and is the prior variance at the candidate location . All these quantities are obtained from a covariance function which is pre-specified and receives as an input two points, and , at which the covariance between and has to be evaluated. A typical covariance function employed for BO is the Matérn function [Snoek2012]. For further details about GPs the reader is referred to [Rasmussen2006].

Thus, BO methods typically look for the best position very carefully to evaluate next the objective function with the aim of finding its optimum with the smallest number of evaluations. This is a very useful strategy when the objective function is very expensive to evaluate and it can save a lot of computational time. Three steps of the BO optimization process are illustrated graphically in Figure 2.1 for a toy minimization problem.

Figure 2.1: An example of BO on a toy 1D noiseless problem.

Figure 2.1 shows a GP estimation of the objective over three iterations. The acquisition function is shown in the lower part of the plot. The acquisition is high where the GP predicts a low objective and where the uncertainty is high. Those regions in which it is unlikely to find the global minimum of have low acquisition values and will not be explored.

Unlike BO methods, grid or uniform search strategies are based in a pure exploration of the search space. If the assumption that the objective function is smooth is made, doing a few evaluations in regions of the input space that look more promising (exploitation) is expected to give better results. In BO methods the acquisition function balances between exploration and exploitation in an automatic way. An example of an acquisition function is Expected Improvement (EI) [Jones1998]. EI is obtained as the expected value under the GP predictive distribution for , of the utility function , where is the best value observed so far. That is, EI measures on average how much the current best solution by evaluating the objective at each candidate point will be improved on. An advantage of EI is that the corresponding acquisition function can be computed analytically: , where and and are respectively the c.d.f. and p.d.f. of a standard Gaussian. EI is the acquisition function displayed in Figure 2.1.

BO has been recently applied with success in different prediction systems for finding good parameter values. For example, it has been used to find the parameters of topic models based on latent Dirichlet allocation, Support Vector Machine (SVM), or deep convolutional NN [Snoek2012]. Furthermore, BO methods have also been used to optimize a logistic regression model for labelling Amazon product reviews [Dewancker2016], or to optimize the weights of a neural network to balance vertical poles and lengths on a moving cart [Frean2008]. Another applications of BO are found in the field of environmental monitoring, in the task of adjusting the parameters of a control system for robotics, in the optimization of recommender systems, and in combinatorial optimization [Brochu2010, Shahriari2016]. Finally, BO methods has been implemented in different software packages. An implementation in python is called Spearmint and is available at [Github], which is the BO implementation used in this work.

2.5 Experiments and results

This section describes some experiments with the aim of showing the improvements obtained in the performance of the prediction system when its parameters are optimized with the Bayesian techniques introduced before. A real problem of prediction ( kW/m, [Goda2010]) from marine buoys is considered. Figure 2.2 shows the three buoys considered in this study at the Western coast of the USA, whose data bases are obtained from [NOAA2016], and their main characteristics are shown in Table 2.1. The objective of the problem is to carry out the reconstruction of buoy 46069 from a number of predictive variables from the other two buoys. Thus, 10 predictive variables measured at each neighbor buoy are considered (a total of 20 predictive variables to carry out the reconstruction). Table 2.2 shows details of the predictive variables for this problem. Data for two complete years (1st January 2009 to 31st December 2010) are used, since complete data (without missing values in predictive and objective ) are available for that period in the three buoys. These data are divided into training set (year 2009) and test set (year 2010) to evaluate the performance of the proposed algorithm.


Figure 2.2: Western USA Buoys considered in this study. In red buoy where the prediction is carried out from data at blue ones.
Characteristics Station 46025 Station 46042 Station 46069
3344’58"N 1193’10"W 3647’29"N 12227’6"W 3340’28"N 12012’42"W
Site elevation sea level sea level sea level
Air temp height 4 m above site elevation 4 m above site elevation 4 m above site elevation
Anemometer height 5 m above site elevation 5 m above site elevation 5 m above site elevation
Barometer elevation sea level sea level sea level
Sea temp depth 0.6 m below water line 0.6 m below water line 0.6 m below water line
Water depth 905.3 m 2098 m 1020.2 m
Watch circle radius 1327 yards 2108 yards 1799 yards
Table 2.1: Geographic coordinates and buoy’s description.
Acronym Predictive units
variable
WDIR Wind direction [degrees]
WSPD Wind speed [m/s]
GST Gust speed [m/s]
WVHT Significant wave height [m]
DPD Dominant wave period [sec]
APD Average period [sec]
MWD Direction DPD [degrees]
PRES Atmospheric pressure [hPa]
ATMP Air temperature [Celsius]
WTMP water temperature [Celsius]
Table 2.2: Predictive variables used in the experiments.

This experimental section is divided into two different subsections. First, the performance of the BO techniques proposed in the optimization of the specific GGA-ELM prediction algorithm is shown. Second, it will presented how the prediction performance is improved when the system is run with the parameters obtained by the BO techniques, i.e. by comparing the performance of the system before and after tuning the parameters with BO.

2.5.1 Methodology

The utility of the BO techniques described in Section 2.4, for finding good parameters for the prediction system described in Section 2.3, will be evaluated. More precisely, the parameters that minimize the RMSE of the best individual found by the GGA on a validation set that contains of the total data available will be tried to find. The parameters of the GGA that are adjusted are the probability of mutation , the percentage of confrontation in the tournament , and the number of epochs . On the other hand, the parameters of the ELM that is used to evaluate the fitness in the GGA are also adjusted. These parameters are the number of hidden units and the logarithm of the regularization constant of a ridge regression estimator, that is used to find the weights of the output layer . Note that a ridge regression estimator for the output layer weights allows for a more flexible model than the standard ELM, as the standard ELM is retrieved when is negative and large [Albert1972].

The BO method is compared with two techniques. The first technique is a random exploration of the space of parameters. The second technique is a configuration specified by a human expert. Namely, , , , and . These are reasonable values that are expected to perform well in the specific application tackled. The computational budget to different parameter evaluations is set for both the BO and the random exploration strategy. After each evaluation, the performance of the best solution found is reported. The experiments are repeated for different random seeds and average results are informed. All BO experiments are carried out using the acquisition function EI and the software for BO Spearmint.

2.5.2 Results I: Bayesian optimization of the wave energy prediction system parameters

Figures 2.3 and 2.4 show the average results obtained and the corresponding error bars for the task of predicting the and the task of predicting the wave height, respectively. Each figure shows the average RMSE of each method (BO and random exploration) on the validation set as a function of the number of configurations evaluated. The performance of the configuration specified by a human expert is also shown. It can be observed that the BO strategy performs best in each setting. In particular, after a few evaluations the BO method is able to outperform the results of the human expert and it provides results that are similar or better than the ones obtained by the random exploration strategy with a smaller number of evaluations.

Figure 2.3: Average results obtained for the optimization after evaluating the performance of 50 different parameters for the BO technique and a random exploration of the parameter space. The performance a configuration specified by a human expert is also shown for comparison.
Figure 2.4: Wave Height optimization average results of the performance of the 50 different parameter values selected by the BO technique and a random exploration of the parameter space. The plot also shows the performance of the parameter values selected by a human expert.
2.5.3 Results II: Estimation of the generalization performance

In a second round of experiments, the performance of the proposed prediction system after its optimization with the BO methodology is shown. Note that, after the FS process with the GGA-ELM approach, an ELM and a SVR [Smola2004, Salcedo2014b] to obtain the final prediction of the and the are used.

Table 2.3 shows the results obtained for the experiments carried out. It can be observed the comparison between ELM and SVR approaches in different scenarios: the prediction obtained with all the features, the prediction obtained with the hybrid algorithm GGA-ELM (without BO methodology), and finally the prediction acquired after the application of the BO process in the GGA-ELM approach. As Table 2.3 summarizes, it is easy to see how the hybrid GGA-ELM algorithm improves the results obtained by the ELM and SVR approaches (without FS). In fact, the SVR algorithm improves the values of the Pearson’s Correlation Coefficient () around 75% in the case of the FS method, against the poor 31% when all features are used. Moreover, these results are improved by means of the BO methodology, using ELM and SVR approaches after the GGA-ELM. In the case of the ELM, values of the around 77% against the 71% achieved with the GGA-ELM algorithm without the BO improvement are obtained. The same behavior is get for the SVR algorithm: values around 78% with the application of the BO methodology against the 75% obtained for the GGA-ELM approach when the parameters are fixed by a human expert. In addition, the reader can comparer the results with other measurement of the accuracy, the Mean Absolute Error (MAE).

Experiments RMSE MAE
All features-ELM 3.4183 kW/m 2.4265 kW/m 0.6243
All features-SVR 4.4419 kW/m 2.8993 kW/m 0.3129
GGA-ELM-ELM 2.8739 kW/m 1.8715 kW/m 0.7101
GGA-ELM-SVR 2.6626 kW/m 1.6941 kW/m 0.7548
BO-GGA-ELM-ELM 2.5672 kW/m 1.7596 kW/m 0.7722
BO-GGA-ELM-SVR 2.4892 kW/m 1.6589 kW/m 0.7823
Table 2.3: Comparative results of the estimation by the ELM and SVR approaches after the FS by the GGA-ELM in 2010.

The results of the previous tables can be better visualized in the following graphics. In Figure 2.5 the temporary predictions carried out by the ELM and SVR approaches are shown. It can be seen how the cases (c) and (d) improve the approximation to the real values against the cases (a) and (b) where the BO methodology is not applied. The same situation can be seen in Figure 2.6, where the scatter plots are presented for the results obtained with and without the BO methodology.

Figure 2.5: prediction after the FS process with the GGA-ELM approach; (a) ELM; (b) SVR; (c) ELM with BO; (d) SVR with BO.
Figure 2.6: Scatter plots in the problem of prediction in tackled by the ELM and SVR with FS by the GGA-ELM; (a) ELM; (b) SVR; (c) ELM with BO; (d) SVR with BO.

The same procedure is carried out in the case of the . Table 2.4 compares the results obtained in the different experiments. As it can be seen, the results are improved with the use of the BO methodology with values of the around 74% for the ELM and SVR predictions, against the 66% and 39% achieved for the ELM and SVR, respectively, with all features. The GGA-ELM algorithm improves these last results, but they are not so good like when the BO methodology is used. In Figures 2.7 the temporary predictions for the GGA-ELM-ELM, GGA-ELM-SVR, BO-GGA-ELM-ELM and BO-GGA-ELM-SVR are shown. The same is done for the scatter plots, whose Figures 2.8, present the results mentioned above.

Experiments RMSE MAE
All features-ELM 0.4653 m 0.3582 m 0.6624
All features-SVR 0.6519 m 0.4986 m 0.3949
GGA-ELM-ELM 0.3650 m 0.2858 m 0.7049
GGA-ELM-SVR 0.3599 m 0.2727 m 0.7056
BO-GGA-ELM-ELM 0.3324 m 0.2519 m 0.7429
BO-GGA-ELM-SVR 0.3331 m 0.2461 m 0.7396
Table 2.4: Comparative results of the estimation by the ELM and SVR approaches after the FS by the GGA-ELM in 2010.
Figure 2.7: prediction after the FS process with the GGA-ELM approach; (a) ELM; (b) SVR; (c) ELM with BO; (d) SVR with BO.
Figure 2.8: Scatter plots in the problem of prediction in tackled by the ELM and SVR with FS by the GGA-ELM; (a) ELM; (b) SVR; (c) ELM with BO; (d) SVR with BO.

In both predictions ( and ) the BO methodology improves the results, for this reason the generality of the proposed method can be highlighted.

2.6 Conclusions

In this paper it has been shown how a hybrid prediction system for wave energy prediction can be improved by means of BO methodology. The prediction system is formed by a grouping GA for FS, and an ELM for effective prediction of the target variable, the and the in this case. After this FS process, the final prediction of the target is obtained by means of an ELM or a SVR approach. The paper describes in detail the BO methodology, and its specific application in the optimization of the GGA-ELM for a real problem of and prediction from buoys data in Western California USA. The results show that the BO methodology is able to improve the performance of the system, i.e., the prediction of the optimized system is significantly better than that of the system without the BO methodology applied. This improvement is related to the optimal selection of parameters carried out by the BO strategy. On the other hand, the main limitation of the proposed methodology is the increase in computation time. Nevertheless, this increase only affects the training phase and not the operation phase, in which predictions are made after training. Therefore, this limitation is not very important. Finally, note that this methodology can be extended to alternative prediction systems and other problems, specially to hybrid approaches involving ML algorithms with a high number of parameters to be tuned.

Chapter 3 Wind power ramps events prediction

3.1 Introduction

Wind Power Ramp Events (WPREs) are large fluctuations of wind power in a short time interval, which lead to strong, undesirable variations in the electric power produced by a wind farm. Its accurate prediction is important in the effort of efficiently integrating wind energy in the electric system, without affecting considerably its stability, robustness and resilience. In this study, the problem of predicting WPREs by applying ML regression techniques is tackled. The proposed approach consists of using variables from atmospheric reanalysis data as predictive inputs for the learning machine, which opens the possibility of hybridizing numerical-physical weather models with ML techniques for WPREs prediction in real systems. Specifically, the feasibility of a number of state-of-the-art ML regression techniques are explored, such as SVR, ANN (MLPs and ELMs) and GPs to solve the problem. Furthermore, the ERA-Interim reanalysis from the European Center for Medium-Range Weather Forecasts is the one used in this work because of its accuracy and high resolution (in both spatial and temporal domains). Aiming at validating the feasibility of this predicting approach, an extensive experimental work using real data from three wind farms in Spain is carried out, discussing the performance of the different ML regression tested in this wind power ramp event prediction problem.

3.1.1 Motivation

Wind power is currently one of the most important renewable energies in the world [Kumar2016] in terms of penetration in the electric power system [Brenna2017, Mohagheghi2017], economic impact and annual growth rate [Ali2017], both onshore [Dai2016] and offshore [Colmenar2016]. Electric power generation is usually carried out in large wind farms [Giebel2016, Herbert2014] far from urban centers [Lunney2017, Jangid2016], though, in the last few years, urban wind power generation is also gaining impulse [Simões2016], including its use in smart grids [Köktürk2017].

The counterpart of the benefits associated with the flourishing of wind energy throughout the world—mainly the reduction of CO emission, one of the causes of global warming [Peters22013] and climate change [Bauer2015]—are problems related not only to the maintenance and management of wind farm facilities, but also to those of power grids. Regarding this, one of the most important problems yet to be solved is the efficient integration [Jones2017] of an increasing number of wind energy generators in both the distribution and transmission power grids, which are becoming increasingly complex [Cuadra2017, Cuadra2015]. Such an intrinsically complex nature of power grids is further increased because of the inherent stochastic nature of wind energy [Yan2015] that, depending on the weather conditions, can lead to intermittent generation [Yan2015]. This can affect the stability, robustness and resilience [Cuadra2017, Cuadra2015] of electric power grids. A useful discussion of the technical differences between these interrelated, but distinct concepts can be found in [Cuadra2015].

Aiming at preserving grid stability in a scenario with a high percentage of intermittent renewable sources—not only wind energy [Colmenar2016], but also photovoltaic [Cabrera2016] and wave [Cuadra2016] energies—power grids need to be made more flexible [Kroposki2017]. In this effort, the emerging technologies associated with smart grids [Köktürk2017] and micro-grids [Yoldaş2017] can be used to mitigate wind power intermittency. An illustrative, very recent proposal in this respect consists of increasing the penetration of Vehicle-to-Grid (V2G) technologies [Gough2017] to use the batteries of idle Electric Vehicles (EV) as power storage units [Zhao2017], absorbing peaks of intermittent overproduction.

Wind power intermittency and its influence on power grids’ stability and performance are the main reasons why Wind Power Forecasting (WPF) [Renani2016, Tascikaraoglu2014] is a key factor to improve its integration without unbalancing the rest of the grid components. Among the different issues in wind power prediction, one of the most significant is the existence of Wind Power Ramp Events. WPREs consist of large fluctuations of wind power in a short period of time, leading to a significant increasing or decreasing of the electric power generated in a wind farm [Zhang2017, Gallego2015a].

The field of scientific research in WPREs’ prediction (or forecasting) [Ouyang2013] is a relatively recent topic driven by the need for improving the management of quick and large variations in wind power output, particularly in the aforementioned context of power grids with high renewable penetration [Alizadeh2016]. A useful review of different WPREs’ definitions (in which there does not seem to be a clear consensus) and their types (increasing or decreasing, depending on the WPRE definition) can be found in [Ferreira2011]. Among them, WPREs’ severity is one of the important issues. Up and down WPREs can exhibit different fluctuating levels of severity, although down WPREs are usually more critical than up WPREs because of the availability of reserves [Zhang2017]. WPREs are usually caused by specific meteorological processes—basically, crossing fronts [Gallego2015b] and fast changes in the local wind direction—and they involve at several scales (synoptic [Ohba2016], mesoscale [Salcedo2009] and microscale). Surprisingly, it has been found recently that very large offshore farms, clustered together, can also generate large WPREs on time scales of less than 6 h [Drew2017]. This gives an idea of the complexity of the WPRE phenomenon.

WPREs’ prediction is not only important for power grid operators, but also for wind farm owners. In fact, the occurrence of WPREs in wind farms is critical not only because of the aforementioned undesired variations of power, but also due to their potential harmful effects in wind turbines, which leads to an increase of management costs associated with these facilities [Cui2015]. Regarding this, the accurate prediction of WPREs has been reported as an effective method to mitigate the economic impact of these events in wind generation power plants [Gallego2015a, Cui2015].

According to [Cui2015, Foley2012], the prediction of WPREs and their influence on electricity generation and grid stability have been recently tackled by using two major families of techniques: (1) “physical-based” models (or numerical approaches aiming to tackle the complexity of the physical equations, which rule the atmosphere to obtain a prediction); and (2) statistical approaches (usually data-driven models to obtain predictions). The first group of techniques, the physical-based approaches, include a set of equations that rules the atmospheric processes and their evolution over time and, because of their complexity and nonlinearity, are tackled by means of numerical methods. The second group of WRPE predicting techniques, the statistical approaches, are data-driven methods that are based on wind time series and include a variety of techniques ranging from conventional approaches—for instance, Autoregressive-Moving-Average (ARMA)—to Computational Intelligence (CI) approaches [Salcedo2016]. These are physics-inspired meta-heuristics [Salcedo2016] able to find approximate solutions to complex problems that otherwise could not be solved or would require very long computational time. They include, among others, three groups of bio-inspired techniques such as EC [De-Jong2006], NC [Ata2015] and FC [Suganthi2015]. An introduction to the main concepts of bio-inspired CI techniques in energy applications can be found in [Cuadra2016, Salcedo2015b].

3.1.2 Purpose and Contributions

The purpose of this work is to explore the feasibility of a novel hybrid WPRE prediction framework, which merges parts of numerical-physical models with state-of-the-art statistical approaches. When the term “hybrid algorithms” is used in this work, that means that this proposal combines data from numerical-physical methods (reanalysis, in this case) with ML approaches (specifically, regressors). Regarding what the hybrid approach means in this study, there are two points to note. The first one is that it would be possible to adapt the proposed regression techniques to operate with alternative data (not coming from numerical methods, reanalysis, in this study). The second one, which is the main novelty of this work, is that the use of data from numerical-physical methods could help achieve valuable prediction of WPREs in wind farms.

The contributions of this work are:

  1. The use of regression techniques in this kind of problem since, up until now, the majority of WPRE prediction frameworks have been based on classification approaches.

  2. The use of reanalysis data as predictive variables of the ML regression techniques. As will be shown, this is because the direct application of regression algorithms makes unnecessary the use of some pre-processing algorithms, which are necessary in other approaches [Dorado2017a, Cornejo2017]. Note that the classification problems associated with WPREs are usually highly unbalanced, which makes it difficult to put into practice high-performance classification techniques without having to use specific over-sampling or similar techniques [Dorado2017a, Cornejo2017].

  3. The performance of the proposed system has been tested using real data from three different wind farms in Spain.

The rest of this chapter is organized as follows: Section 3.2 states the problem definition we tackle in this work, in which the WPRE prediction is formulated as a regression task. Section 5.1 presents the data and predictive variables involved in our proposal. In turn, Section 5.4 shows the experimental work carried out, these results being obtained by the different tested algorithms in three WPRE prediction problems located at three distinct wind farms in Spain. Sections 3.4.2 and 3.5 complete the study by giving some final concluding remarks on the work carried out.

3.2 Problem Definition

Following previous works in the literature [Gallego2015a, Dorado2017a, Cornejo2017, Dorado2017b], a WPRE can be characterized by a number of parameters:

  • Magnitude (): defined as the variation in power produced in the wind farm or wind turbine during the ramp event (subscript “”).

  • Duration (): time period during which the ramp event is produced.

In addition to the magnitude and duration of a wind ramp, the derived quantity called the ramp rate () is used to define the intensity of the ramp.

Taking these parameters into account, in the majority of previous works in the literature, the WPRE detection problem has been defined as a classification problem [Bossavy2015]. Within this framework, let be the so-called ramp function, i.e., a criterion function that is usually evaluated to decide whether or not there is a WPRE. There are several definitions of , all of them involving power production () criteria at the wind farm (or wind turbine), but the two more common ones are the following [Gallego2015a]:

(3.1)
(3.2)

Note that, in the ramp function stated by Equation (3.1), the power variation is referred to a given time interval . In the experimental work carried out throughout this work, such a time interval has been assumed to be h (the “reference time interval”) because of the reanalysis resolution.

Using any of these definitions of the ramp function , the classification problem can be stated by defining a threshold value , in the way:

(3.3)

where is an “indicator function” to be used to label the data in the binary classification formulation of the problem.

As will be shown later on, in this approach, first of all, the threshold value is set, and then, a WPRE is detected if the ramp function is larger than 50% of . It is worth mentioning that, if there is an interest in establishing a larger number of cases (for example, five classes of WPRE), it would need at least two thresholds to do so.

The WPRE detection problem also involves a vector of predictive variables . Different types of inputs have been used as predictive variables in the literature. The key point here is that the meteorological process must be always considered, since they are physical precursors of WPREs. Different numerical weather prediction system outputs have been used to obtain these predictive variables, including reanalysis data [Gallego2015b]. This provides a long history record of meteorological variables to be used as predictive variables for WPRE prediction. Following these previous works, in this paper, the following version of the WPRE prediction problem is tackled:

Let (with ) be time series of predictive variables and values of the ramp function (objective variables). The problem consists of training a regression model in a subset of (training set), in such a way that, when is applied to a given test set , an error measure is minimized.

3.3 Data and Predictive Variables

A reanalysis project is a methodology carried out by some weather forecasting centers, which consists of combining past observations with a modern meteorological forecast model, in order to produce regular gridded datasets of many atmospheric and oceanic variables, with a temporal resolution of a few hours. Reanalysis projects usually extend over several decades and cover the entire planet, being a very useful tool for obtaining a comprehensive picture of the state of the Earth system, which can be used for meteorological and climatological studies. There are several reanalysis projects currently in operation, but one of the most important is the ERA-Interim reanalysis project, which is the latest global atmospheric reanalysis produced by the ECMWF [Dee2011]. ERA-Interim is a global atmospheric reanalysis from 1979, continuously updated in real time. The data assimilation system used to produce ERA-Interim is based on a 2006 release that includes a four-Dimensional Variational analysis (4D-Var) with a 12-h analysis window. The spatial resolution of the dataset is approximately 15 km, on 60 vertical levels from the surface up to 0.1 hPa. ERA-Interim provides six-hourly atmospheric fields on model levels, pressure levels, potential temperature and potential vorticity and three-hourly surface fields.

Aiming to tackle the WPRE prediction problem in this study, wind and temperature-related predictive variables is considered from ERA-Interim at some specific points in the neighborhood of the area under study. The variables considered as predictors (Table 3.1) are taken at different pressure levels (surface, 850 hPa and 500 hPa), in such a way that different atmospheric processes can be taken into account. A total of 12 prediction variables per ERA-Interim node and four nodes surrounding the area under study (wind farm) are considered at time , i.e., in this problem, is formed by predictive variables. The ERA-Interim time resolution for the predictive variables (6 h) sets in this case the ramp duration taken into account ().

Thus, each regression model analyzed in this work () must be trained with the data or , where and are computed using Equations (3.1) and (3.2), respectively.

Variable Name ERA-Interim Variable
skt surface temperature
sp surface pression
zonal wind component () at 10 m
meridional wind component () at 10 m
temp1 temperature at 500 hPa
up1 zonal wind component () at 500 hPa
vp1 meridional wind component () 500 hPa
wp1 vertical wind component () at 500 hPa
temp2 temperature at 850 hPa
up2 zonal wind component () at 850 hPa
vp2 meridional wind component () at 850 hPa
wp2 vertical wind component () at 850 hPa
Table 3.1: Predictive variables considered at each node from the ERA-Interim reanalysis.

3.4 Experimental Work

This section presents the experimental evaluation of the proposed approach in a real problem of WPRE prediction, by exploring the different ML regressors used in this work (SVR, ELM, GP and MLP). Prior to describing the experiments carried out, it is worth emphasizing the practical importance of using reanalysis data to test the accuracy and feasibility of the proposed hybrid approach with ML regressors. Non-hybrid approaches (the use of regression techniques in other alternative data, from measuring stations, for example) is also possible. However, note that, from the viewpoint of the repeatability of the experiments, reanalysis data are very convenient since they are freely available on the Internet, so that the experimental part of this work can be easily reproduced by other researchers.

Starting with the detailed description of the experimental work carried out, three wind farms are considered in Spain, whose locations have been represented in Figure 3.1. The three wind farms chosen (labeled “A”, “B” and “C” in Figure 3.1) are medium-sized facilities, with 32, 28 and 30 turbines installed, respectively. Note that the wind farms selected cover different parts of Spain, north, center and south, characterized by different wind regimes. Different numbers of data were available for each wind farm: in wind farm “A”, data ranges 11/01/2002–29/10/2012, while in wind farm “B” ranges 23/11/2000–17/02/2013. In wind farm “C”, the data used are between 02/03/2002 and 30/06/2013.

Figure 3.1: Representation of the geographical location of the wind farms (labeled “A”, “B” and “C”) considered in the experimental work carried out in this thesis. The four closest nodes from the Era-Interim reanalysis (predictive variables) have also been represented for illustrative purposes. The reason why these wind farms have been selected is that they cover different parts of Spain, north, center and south, characterized by different wind regimes.

A pre-processing step to remove missing and corrupted data was carried out. Note that data every 6 h (00 h, 06 h, 12 h and 18 h) is only kept, to match the predictive variables from the ERA-Interim to the objective variables.

The performance of the four ML regressors described in Section 1.2, in WPREs prediction problems at each wind farm is shown in terms of different error measurements (), such as RMSE, MAE or “sensitivity”, , also called the true positive rate. This last measure is defined as:

(3.4)

where: (1) stands for the number of positive predictions, i.e., the correct predictions of ascending (), descending () and no ramps (with the definition), and ramps or no ramps (with the definition) values in the experiments; (2) stands for the number of positive values in the test, i.e., the total real values of positive ramps, negative ramps, ramps or no ramps in the database. Note that this way, the experiments are performed with the two different definitions of the ramp function ( and ) given in Section 3.2.

The following step to obtain the prediction of the WPREs is to train the considered ML regressors. A partition of the data into training (80%), and test (20%) sets is carried out. In the case of the SVR and MLP, a validation set from the training (5%) set is also considered. This validation set is used to obtain the best SVR hyper-parameters , and , by means of a GS [Smola2004]. The validation set is also used in the training of the MLP approach, in order to prevent the NN from overtraining. Both training and test sets have been randomly constructed from the available data after the cleaning pre-processing. The concrete configurations and the values used for the parameters of the considered ML regression models, , are listed in Table 3.2.

With all these previous considerations in mind, Sections 3.4.1 and 3.4.2, focus on showing the results obtained and on discussing them, respectively.

Model Model Configuration Values Used in the Design Parameters for Each Model
SVR SVR with Gaussian kernel
, ; , ;
,
ELM
3-layer NN with
sigmoid activation function
Number of neurons in each of the
three layers (input-hidden-output): 48-150-1
GP RBF kernel
;
variance(); /4
MLP Levenberg–Marquardt training
epoch ; gradient ;
; validation-checks
Table 3.2: Configuration and design parameters of the regression ML models explored in the proposed approach for all the wind farms considered.
3.4.1 Results

As mentioned in the description of the problem at hand, among the several definitions of ramp functions, , the most common ones are considered [Gallego2015a], stated, respectively, by Equations (3.1) and (3.2), because both include power production criteria () at the wind farm. The variation of power caused by a wind ramp, , has been studied in the experiments below in the three wind farms (Figure 3.1) within a time interval h, which is determined by the resolution of the reanalysis data.

In addition, in order to properly understand the analysis of the results obtained, it is convenient to point out that, by using the indicator function stated by Equation (3.3), the proposed methodology is able to successfully detect those WPREs that surpass the thresholds ( or ), when using the ramp function definition, or the single threshold (), when using the definition. As will be shown later on, this is due to the fact that, with the first ramp definition (), it can be detected three types of events: ascending ramps (which are those whose power exceeds ), descending ramps (those surpassing ) and the existence of “no ramps” (when the generated electric power is in between the two thresholds). Conversely, in the case of using the ramp function definition, it is only necessary to determine whether or not there is a ramp, so that only a threshold is necessary.

Taking these considerations into account and aiming at better explaining the results, the discussion is organized according to the objective function used, either or , leading to Sections 3.4.1 and 3.4.1, respectively.

Results using as the Ramp Function Definition

Table 3.3 shows the results obtained in this problem of WPRE prediction when considering as the objective function, in the three aforementioned wind farms in Spain (labeled “A”, “B” and “C” in Figure 3.1). For each wind farm, the performance of any of the ML regressors explored (SVR, ELM, GP and MLP) has been measured using the metrics RMSE, MAE and sensitivity ( (ramp), (ramp), (no ramp)).

Wind Farm A
ML regressor RMSE MAE (+ramp) (ramp) (no ramp)
(MW) (MW) (%) (%) (%)
SVR 7.0085 5.2673 26.93 24.20 96.59
ELM 5.6779 4.2499 40.54 42.59 95.51
GP 5.3066 3.9519 54.93 51.95 93.96
MLP 5.4538 4.0021 12.13 5.72 99.41
Wind Farm B
ML regressor RMSE MAE (+ramp) (ramp) (no ramp)
(MW) (MW) (%) (%) (%)
SVR 8.0025 5.9773 35.53 34.10 86.66
ELM 7.4539 5.9768 32.93 33.14 92.13
GP 5.9856 4.4298 52.10 58.25 91.71
MLP 5.9009 4.3429 15.11 13.14 97.25
Wind Farm C
ML regressor RMSE MAE (+ramp) (ramp) (no ramp)
(MW) (MW) (%) (%) (%)
SVR 7.1370 5.3406 45.38 44.20 91.33
ELM 5.8367 4.4462 50.32 47.64 94.01
GP 4.7515 3.4771 57.14 61.05 93.99
MLP 5.0727 3.6827 14.21 10.26 98.52
Table 3.3: Results (in terms of RMSE, MAE and sensitivity) corresponding to the estimation of the ramp function (Equation (3.1)) obtained when using the proposed approach, as a function of the ML regressors explored (SVR, ELM, GP and MLP), in the tree study cases: the wind farms “A”, “B” and “C”, whose locations have been represented in Figure 3.1.

Regarding the reasons why the mentioned metrics are used to the detriment of others, it is convenient to stress some aspects related to what, in fact, are two conceptually distinct groups of measures: metrics that measure errors (RMSE and MAE), on the one hand, and metrics that quantify success prediction rates (sensitivity), on the other. These facets to be highlighted are:

  • With respect to the “conventional” metrics that measure errors, there are two reason that have compelled us to include the RMSE and MAE metrics. The first one is that they are the most commonly used in the literature. Examples of relevant papers in which these metrics are used for WPRE forecasting are [Gallego2015a, Cutler2007, Gallego2011, Gallego2013]. Please see [Gallego2015a] for a useful discussion on this issue. The second cause is, as will be shown, that the utility of these error measures can be complemented by using the sensitivity metric, the other class of metrics that are chosen.

  • The second couple of points that are important to be emphasized here are just those related to the aforementioned sensibility in Equation (3.4), one with respect to its meaning and the other regarding its application. On the one hand, the physical meaning of sensitivity is just the percentage of correct ramp predictions with respect to actual measured data. Despite its apparent simplicity, this is, however, an excellent measure of the extent to which the regressor algorithm under test is efficient in detecting wind ramps. On the other hand, regarding its application step in the proposed methodology, the key point is that sensitivity is only used after having predicted the ramp function with a regression technique and a threshold has been defined. After applying the threshold, the number of real WPREs is thus obtained and compared to the predicted number. This way, the fact that the problem is highly unbalanced is not an issue any longer; or, in other words, the regression techniques are applied to the ramp function, and then, a threshold to classify events is established. In this case, the percentage of correct WPRE identifications is obtained. Note that the work’s objective is to deal with a regression problem, it is enough to show the good percentage of correct classification after the threshold setting in the predicted ramp function.

The analysis of Table 3.3 allows for elucidating some interesting conclusions:

  1. The performance of the ML regressors is, in general, good in terms of RMSE, MAE and sensitivity , although, as shown, there are some ML regressors that work better than others.

  2. Regarding the performance of one regressor with respect to that of another, the results of Table 3.3 clearly indicate that the GP model reaches the best results of all the regressors tested, with an excellent reconstruction of the ramp function from the ERA-Interim variables. Note in Table 3.3 that the values of the metrics obtained by the GP regressor are marked in bold. Its RMSE and MAE values are much lower (better) than those of the other ML regressors explored. In terms of sensitivity, its performance is even better. Specifically, its sensitivity (or percentage of correct predictions (with respect to the real, measured data) stated by Equation (3.4)) is much higher (better) than those of the other regressors: (+ramp) (+ramp) (for ascending ramps) and (ramp) (ramp) (for descending ramps). This confirms the validity of the results measured with the error metrics and proves the feasibility of the proposed methodology for predicting wind ramps, both ascending and descending ramps.

  3. The worst result corresponds to the MLP, with a poorer detection of positive WPREs, when compared to the other ML regressors.

  4. The SVR and ELM work well in between both GP and MLP, with acceptable values of detection in positive WPREs.

With this analysis in mind, Figures 3.23.4 show the estimation of obtained by the GP and ELM algorithms (the two best approaches tested in the experiments), when using as the objective function, for the wind farms A, B and C, respectively. Some aspects to correctly interpret these figures are:

  • Aiming at clearly showing the algorithms’ performance, only the 300 first samples of the test set have been represented in these figures.

  • Furthermore, a threshold value (and the corresponding ) has been marked in these figures, so it can be used to decide whether or not the event is a ramp power event (see Equation (3.3)). When a ramp occurs, it is possible to decide whether the ramp event is ascending or descending.

Figure 3.2: (a) Estimation of the ramp function (Equation (3.1)) obtained by using the proposed approach in the particular case in which the ML is an ELM regressor. This figure corresponds to Wind Farm A, whose location has been represented in Figure 3.1. (b,c) represent two shorter excerpts in which the predicted WPREs that exceed the thresholds ( or ) are shown to be correctly detected. A WPRE is detected if . The predicted series exhibits RMSE MW, MAE MW, (+ramp) , (ramp) and (no ramp) .

The results illustrated in Figure 3.2 (a) show two data series: the series of real measured WPRE (red ) and the series of predicted WPRE (blue ) values computed by using the proposed hybrid methodology. In the effort to better explain the results and the applicability of this proposal, Figure 3.2 is drawn in a more detailed way than the others, zooming into two shorter time excerpts, b and c. The insets b and c show how there are some WPREs that surpass any of the thresholds and . Specifically, and as mentioned before, a WPRE is detected in this approach if the ramp function is larger than 50% of . Note that Figure 3.2b,c show how the predicted WPREs (blue ) exceeding any thresholds ( or ) are correctly predicted when compared to the real, measured WPRE (red ).

Regarding such a threshold value, it is worth mentioning that is not used until the very end of the experiments, once the ramp function has been predicted with the ML regression algorithms. In this respect, it is also convenient to remark that, in the proposed approach, it does not look to optimize . Only is displayed as an indication (example) that the ML regression model applied can be turned into a classification for WPRE. Note, however, that the purpose of this study is to deal with it as a regression problem.

The good performance observed in Figure 3.2 for the ELM is common (and even better) to those illustrated in Figures 3.3 and 3.4.

Figure 3.3: Estimation of the ramp function (Equation (3.1)) obtained by this proposed hybrid approach when using the GP as the ML regressor in Wind Farm B. The predicted series exhibits RMSE MW, MAE MW, (+ramp) , (ramp) and (no ramp) (see Table 3.3).
Figure 3.4: Prediction of the ramp function (Equation (3.1)) when using the GP in Wind Farm C. The predicted ramps series exhibits RMSE MW, MAE MW, (+ramp) , (ramp) , and (no ramp) (see Table 3.3).

The joint analysis of both Figures 3.2 and 3.4 and Table 3.3 reveals the suitable throughput of the ML regression techniques (mainly the GP model), which hybridized with the ERA-Interim predictive values, assist in obtaining a robust decision system in terms of the existence or not of a power ramp, depending, of course, on the definition of the threshold .

Results using as the Ramp Function Definition

On the other hand, Table 3.4 and Figures 3.53.7 will assist us to explain the results when is the ramp function to be predicted.

Wind Farm A
ML regressor RMSE MAE (ramp) (no ramp)
(MW) (MW) (%) (%)
SVR 6.8847 5.1876 31.33 96.27
ELM 5.7037 4.2925 41.99 95.01
GP 5.2048 3.7897 49.66 96.36
MLP 5.4351 3.9861 8.71 99.44
Wind Farm B
ML regressor RMSE MAE (ramp) (no ramp)
(MW) (MW) (%) (%)
SVR 7.9439 5.8853 44.16 85.55
ELM 7.3148 5.8675 34.67 93.23
GP 5.9223 4.4037 65.32 84.12
MLP 5.9051 4.3475 14.76 97.17
Wind Farm C
ML regressor RMSE MAE (ramp) (no ramp)
(MW) (MW) (%) (%)
SVR 7.1525 5.4677 37.88 93.83
ELM 5.8624 4.4368 58.16 92.10
GP 5.1030 3.6991 57.26 94.42
MLP 5.0605 3.6670 11.22 98.56
Table 3.4: Results (in terms of RMSE, MAE and sensitivity) corresponding to the estimation of the ramp function (Equation (3.2)) obtained by the proposed approach as a function of the ML regressors explored (SVR, ELM, GP, and MLP), for Wind Farms “A”, “B” and “C”, respectively.
Figure 3.5: Estimation of the ramp function (Equation (3.2)) obtained by the proposed approach using the GP regressor, in Wind Farm A. The ramp predicted values resemble the ramp measured ones with RMSE MW and MAE MW, (ramp) and (no ramp) (see Table 3.4).
Figure 3.6: Estimation of the ramp function (Expression (3.2)) obtained by the proposed method when using the ELM regressor, in Wind Farm B. The predicted series follows the measured series with RMSE MW and MAE MW, (ramp) and (no ramp) (see Table 3.4).
Figure 3.7: Estimation of the ramp function (Expression (3.2)) obtained by the proposed method when using the ELM regressor, in Wind Farm C. The predicted series follows the measured series with RMSE MW and MAE MW, (ramp) and (no ramp) (see Table 3.4).

Table 3.4 presents the results (in terms of RMSE, MAE and sensitivity) corresponding to the estimation of the ramp function (Expression (3.2)) achieved by using the proposed approach as a function of the ML regressors explored (SVR, ELM, GP and MLP).

A first aspect that stands out of Table 3.4 is that it has fewer columns related to sensitivity than those of Table 3.3. This is an interesting points that arises from the different definitions of the ramp function , either or . Note that, for definition , the sensitivity is the percentage of correctly predicted results (either ramp or no ramp) with respect to the actual measured data. This is the reason why has only two columns in Table 3.4, (ramp) and (no ramp), whereas Table 3.3 exhibits three -related columns. This is because, in the case of the ramp definition, there are three events to be detected: ascending ramp (), descending ramp () and no ramps.

In the same way as Table 3.3, Table 3.4 also reveals that, for , the GP approach exhibits the best results, outperforming clearly the rest of the ML regressors tested, except the MLP. This has similar values only in its error metric, RMSE and MAE, but not in its (ramp) value, which is considerably worse than that of the GP. This is clear, for instance, in Wind Farm A, in which RMSE MW, less than that of the other regressors. Note that (ramp) (ramp). In Wind Farm B, the performance of the GP (RMSE MW) is similar to that of the MLP and much better than that of SVR (RMSE MW) and SVR (RMSE MW). Note again that, although the GP model is similar to the MLP in error metrics, however, the GP exhibits much better sensitivity than the MLP, (ramp) (ramp). This is true not only for the MLP (which has similar errors), but also for the rest of the ML, which are long surpassed by the GP model in the aim of detecting wind ramps. For clarity, this is marked in bold in Table 3.4. This means that the GP is more efficient in predicting wind ramps (the very core of this approach) than the others, and this is the reason why the sensitivity helps supplement the information provided by the error metrics