An Outlyingness Matrix for Multivariate Functional Data Classification
Wenlin Dai and Marc G. Genton
October 10, 2018
Abstract
The classification of multivariate functional data is an important task in scientific research. Unlike pointwise data, functional data are usually classified by their shapes rather than by their scales. We define an outlyingness matrix by extending directional outlyingness, an effective measure of the shape variation of curves that combines the direction of outlyingness with conventional depth. We propose two classifiers based on directional outlyingness and the outlyingness matrix, respectively. Our classifiers provide better performance compared with existing depthbased classifiers when applied on both univariate and multivariate functional data from simulation studies. We also test our methods on two data problems: speech recognition and gesture classification, and obtain results that are consistent with the findings from the simulated data.
Keywords: Directional outlyingness; Functional data classification; Multivariate functional data; Outlyingness matrix; Statistical depth.
1 Introduction
Functional data are frequently collected by researchers in many research fields, including biology, finance, geology, medicine, and meteorology. As with other types of data, problems, such as ranking, registration, outlier detection, classification, and modeling also arise with functional data. Many methods have been proposed to extract useful information from functional data (Ramsay and Silverman, 2006; Ferraty and Vieu, 2006; Horváth and Kokoszka, 2012). Functional classification is an essential task in many applications, e.g., diagnosing diseases based on curves or images from medical test results, recognizing handwriting or speech patterns, and classifying products (Epifanio, 2008; Delaigle and Hall, 2012; Alonso et al., 2012; Sguera et al., 2014; Galeano et al., 2015).
Statistical depth was initially defined to rank multivariate data, mimicking the natural order of univariate data. Zuo and Serfling (2000) presented details on statistical depth. Recently, the concept of statistical depth has been generalized to functional depth to rank functional data from the center outward (Fraiman and Muniz, 2001; Cuevas et al., 2007; LópezPintado and Romo, 2009; LópezPintado et al., 2014; Claeskens et al., 2014). An alternative way to rank functional data is the tilting approach proposed by Genton and Hall (2016). Functional depth, as a measure of the centrality of curves, has been used extensively to classify functional data, especially if the dataset is possibly contaminated (Sguera et al., 2014). LópezPintado and Romo (2006) defined (modified) band depth for functional data, based on which they proposed two methods for classification of functional data: “distance to the trimmed mean” and “weighted averaged distance”. Cuevas et al. (2007) introduced random projection depth and the “within maximum depth” criterion. Sguera et al. (2014) defined kernelized functional spatial depth and comprehensively investigated the performance of depthbased classifiers. CuestaAlbertos et al. (2015) and Mosler and Mozharovskyi (2015) discussed functional versions of the depthdepth (DD) classifier (Li et al., 2012; Liu et al., 1999). Hubert et al. (2015a) proposed functional bag distance and a distancedistance plot to classify functional data. Kuhnt and Rehage (2016) proposed a graphical approach using the angles in the intersections of one observation with the others.
There have been many other attempts to tackle the challenge of functional data classification, a great number of which sought to generalize finite dimensional methods to functional settings. Specifically, these approaches firstly map functional data to finitedimensional data via dimension reduction and then apply conventional classification methods, e.g., linear discriminant analysis (LDA) or support vector machines (SVM) (Boser et al., 1992; Cortes and Vapnik, 1995), to the obtained finitedimensional data. Dimension reduction techniques mainly fall into two categories: regularization and filtering methods. The regularization approach simply treats functional data as multivariate data observed at discrete time points or intervals (Li and Yu, 2008; Delaigle et al., 2012), and the filtering approach approximates each curve by a linear combination of a finite number of basis functions, representing the data by their corresponding basis coefficients (James and Hastie, 2001; Rossi and Villa, 2006; Epifanio, 2008; Galeano et al., 2015).
Most of the aforementioned methods focus on univariate functional data. Very little attention has been paid to multivariate functional cases, which are also frequently observed in scientific research. Examples of multivariate functional cases are gait data and handwriting data (Ramsay and Silverman, 2006), height and weight of children by age (LópezPintado et al., 2014) and various records from weather stations (Claeskens et al., 2014). Classifying such multivariate functional data jointly rather than marginally is necessary because a joint method takes into consideration the interaction between components and one observation may be marginally assigned to different classes by different components.
Locations/coordinates are used to classify pointwise data; however, the variation between different groups of curves in functional data classification usually results from the data’s different patterns/shapes rather than their scales. We refer the readers to the simulation settings and real applications in a number of references (Cuevas et al., 2007; Alonso et al., 2012; Epifanio, 2008; Galeano et al., 2015; Sguera et al., 2014). This important feature of functional data classification cannot be handled by conventional functional depths which do not effectively describe the differences in shapes of curves. A recently proposed notion of directional outlyingness (Dai and Genton, 2017) overcomes these drawbacks. The authors pointed out that the direction of outlyingness is crucial to describing the centrality of multivariate functional data. By combining the direction of outlyingness with the conventional pointwise depth, they established a framework that can decompose total functional depth into shape depth and scale depth. Shape depth measures the change of pointwise depth not only in relation to the scale but also in relation to the direction of outlyingness. It thus effectively describes the shape variation between curves. Directional outlyingness, defined similarly, also decomposes total outlyingness into two parts, shape outlyingness and scale outlyingness. We extend the scalar directional outlyingness to an outlyingness matrix, which contains pure information of shape variation of a curve. Based on directional outlyingness and the outlyingness matrix, we propose two classification methods for multivariate functional data.
The remainder of the paper is organized as follows. In Section 2, we briefly review the framework of directional outlyingness, define the outlyingness matrix and propose two classification methods for multivariate functional data using this framework. In Section 3, we evaluate our proposed classifiers on both univariate and multivariate functional data via simulation studies. In Section 4, we use two real datasets to illustrate the performance of the proposed methods in practice. We end the paper with a short conclusion in Section 5 and provide technical proofs for the theoretical results in the Appendix.
2 Directional Outlyingness and Classification Procedure
Considering groups of data as training sets, to classify a new observation from the test set, , into one of the groups, one needs to find an effective measure of distance between and each groups. Such a measure is the Bayesian probability for the naive Bayes classifier, the Euclidean distance for the nearest neighbors classifier, or the functional outlyingness/depth for the depthbased classifier. Our classification methods fall into the category of depthbased classifiers. In what follows, we first review the framework of directional outlyingness as our measure for the distance between a new curve and a labeled group of curves, and then propose two classification methods based on this framework.
2.1 Directional Outlyingness
Consider a variate stochastic process of continuous functions, , with each (): , from the space of real continuous functions on . At each fixed time point, , is a variate random variable. Here, is a finite positive integer that indicates the dimension of the functional data and is a compact time interval. We get univariate functional data when and multivariate functional data when . Denote the distribution of as and the distribution of , which is the function value of at time point , as . For a sample of curves from , , the empirical distribution is denoted as ; correspondingly, the empirical distribution of is denoted as . Let : be a statistical depth function for with respect to . The finite sample depth function is then denoted as .
Directional outlyingness (Dai and Genton, 2017) is defined by combining conventional statistical depth with the direction of outlyingness. For multivariate pointwise data, assuming , the directional outlyingness is defined as
where is the unit vector pointing from the median of to . Specifically, assuming that is the unique median of , can be expressed as , where denotes the norm. Then, Dai and Genton (2017) defined three measures of directional outlyingness for functional data as follows:
1) the mean functional directional outlyingness (FO) is
2) the mean directional outlyingness (MO) is
3) the variation of directional outlyingness (VO) is
where is a weight function defined on , which can be constant or proportional to the local variation at each time point (Claeskens et al., 2014). Throughout this paper, we use a constant weight function, , where represents the Lebesgue measure. indicates the position of a curve relative to the center on average, which measures the scale outlyingness of this curve; represents the variation in the quantitative and directional aspects of the directional outlyingness of a curve and measures the shape outlyingness of that curve. Further, we may link the three measures of directional outlyingness through the following equation:
(1) 
Then, can be regarded as the overall outlyingness and is equivalent to the conventional functional depth/outlyingness. When the curves are parallel to each other, becomes zero and a quadratic relationship then appears between and . Most of existing depths can be used to derive their corresponding directional outlyingness, among which we suggest the distancebased depth, e.g., random projection depth (Zuo, 2003) and the Mahalanobis depth (Zuo and Serfling, 2000). In the current paper, we choose the Mahalanobis depth to construct directional outlyingness for all the numerical studies.
Figure 1 presents an example of directional functional outlyingness with a group of bivariate curves. In the graph on the left, the nonoutlying curves are shown in black and outliers are shown in different colors. In the middle graph, the grey surface is a quadratic surface satisfying . Because the nonoutlying curves and the two shifted outliers (blue and red curves) are mutually parallel, their mapped points, , fall exactly onto the grey quadratic surface. However, the two points corresponding with the shifted outliers are isolated from the cluster, making them easy to recognize. The right graph presents a scatter plot of , from which we can simply distinguish the cyan and purple points from the others by their values. The green point is not only isolated from the cluster, but it also has a larger , which coincides with its outlyingness for both scale and shape. Compared with conventional functional depths, directional outlyingness more effectively describes the centrality of functional data, especially the shape variation part. This is because accounts for not only variation of absolute values of pointwise outlyingness but also for the change in their directions. This advantage coincides with the functional data classification task, which is essentially to distinguish curves by their differences in shapes rather than scales.
With the above advantages, we adopt the functional directional outlyingness to measure the distance between the curve to be classified and the labeled groups of curves. In the next two subsections, we propose two classification methods for multivariate functional data. Both methods are based on a similar idea used by the maximum depth classifier: a new curve should be assigned to the class leading to the smallest outlyingness value.
2.2 TwoStep Outlyingness
Directional outlyingness maps one variate curve to a dimensional vector, . If are generated from a stationary Gaussian process, we may expect that has a dimensional normal distribution. We provide an example in Figure 2 to depict such a situation. As shown in Figure 1, the ’s that correspond to the outliers are also isolated from the cluster of projected points of nonoutlying curves. Hence, we can measure the outlyingness of a curve, , using the outlyingness of its respective point, . When follows a normal distribution, we calculate its Mahalanobis distance (Mahalanobis, 1936), which is employed as a twostep outlyingness of the raw curve.
For a set of observations, (), a general form of the Mahalanobis distance can be expressed as
where is the mean vector of the ’s and is the covariance matrix. Various estimators of exist in the literature, among which the minimum covariance determinant (MCD) estimator (Rousseeuw, 1985) is quite popular due to its robustness. To subtract the influence of potential outliers, we utilize this estimator to calculate the distance for our method.
In particular, the robust Mahalanobis distance based on MCD and a sample of size can be expressed as
where denotes the set of points that minimizes the determinant of the corresponding covariance matrix, and . The subsample size, , controls the robustness of the method. For a dimensional distribution, the maximum finite sample breakdown point is , where denotes the integer part of . Assume that we get groups of functional observations, named (). To classify a new curve, , into one of the groups, we use the following classifier:
where is the group label, to which we assign , and is the robust Mahalanobis distance of to . This classifier is based on an idea similar to the “within maximum depth” criterion (Cuevas et al., 2007), which assigns a new observation to the group that leads to a larger depth. The difference is that we use a twostep outlyingness, which can better distinguish shape variation between curves compared with conventional functional depths utilized in existing methods.
2.3 Outlyingness Matrix
Unlike conventional statistical depth, pointwise directional outlyingness of multivariate functional data, , is a vector, which allows us to define two additional statistics to describe the centrality of multivariate functional data.
Definition 1
(Outlyingness Matrix of Multivariate Functional Data): Consider a stochastic process, , that takes values in the space of real continuous functions defined from a compact interval, , to with probability distribution . We define the functional directional outlyingness matrix (FOM) as
and the variation of directional outlyingness matrix (VOM) as
can be regarded as a matrix version of the total outlyingness, , and corresponds to the shape outlyingness, . A decomposition of and its connection with the scalar statistics are proposed in the following theorem.
Theorem 1
(Outlyingness Decomposition): For the statistics defined in Definition 1, we have:

;

and , where denotes the trace of a matrix.
Theorem 2
(Properties of the Outlyingness Matrix): Assume that is a valid directional outlyingness for pointwise data from Dai and Genton (2017). Then, for a constant weight function, we have
where is a transformation of in both the response and support domains, with for and an orthogonal matrix, is an vector at each time , and is a bijection on the interval .
Throughout the paper, we focus on the cases when the distinction between different groups of functional data depends on their patterns/shapes. effectively measures the level of shape variation between one curve and a group of curves. Hence, our second classifier is defined as
where denotes the Frobenius norm of a matrix and is the group label, to which we assign . Compared with our first classifier, this second classifier is purely based on the shape information, which means that it is more effective at handling shape classification problems. We choose the Frobenius norm to get a scalar to take into consideration the interaction between the outlyingness in different directions (the offdiagonal elements of ).
3 Simulation Studies
In this section, we conduct some simulation studies to assess finitesample performances of the proposed classification methods and compare them with some existing methods based on conventional statistical depth. We investigate both univariate and multivariate functional data cases.
3.1 Classification Methods
We calculate the pointwise directional outlyingness with the Mahalanobis depth (MD) (Zuo and Serfling, 2000) for our proposed methods, twostep outlyingness, denoted by RMD, and outlyingness matrix, denoted by . We consider the “within maximum depth” criterion (Cuevas et al., 2007) for existing methods, using the following four conventional functional depths that can handle both univariate and multivariate functional data.
 Method FM1.
 Method FM2.

Integrated depth with MD as the pointwise depth. The R functions depth.FM and depth.FMp in the package fda.usc are used to calculate FM1 and FM2 for univariate and multivariate cases, respectively.
 Method RP1.

Random projection depth defined by Cuevas et al. (2007). In this method, we randomly choose directions, project the curves onto each direction, calculate the statistical depth based on the projections for each direction and take the average of the directionwise depth. Here, we set the number of random directions, . Note that the direction in this method refers to a random function, , in the Hilbert space so that the projection of a datum, , is given by the standard inner product . We use TD as the directionwise depth for this method.
 Method RP2.

Random projection depth with MD as the directionwise depth. The R functions depth.RP and depth.RPp in the package fda.usc are used to calculate RP1 and RP2 for univariate and multivariate cases, respectively.
TD and MD are selected as representatives of rankbased and distancebased depths, respectively. Except for the above functional depths, many other notions have been proposed in the literature. Some methods can be regarded as special cases of FM1 (with different pointwise depths), including modified band depth (LópezPintado and Romo, 2009), halfregion depth (LópezPintado and Romo, 2011), simplicial band depth (LópezPintado et al., 2014), multivariate functional halfspace depth (Claeskens et al., 2014), and multivariate functional skewadjusted projection depth (Hubert et al., 2015b). Some methods have been specifically designed for univariate functional data, including kernelized functional spatial depth (Sguera et al., 2014) and extremal depth (Narisetty and Nair, 2016).
3.2 Univariate Functional Data
We consider three univariate settings. We mention that different groups of curves vary in terms of patterns or shapes rather than scales. Each pair of curves thus oscillates within a similar range in different fashions in our settings.
Data 1. Class 0: and class 1: , where and are generated independently from a uniform distribution , and are i.i.d. observations from and is a Gaussian process with covariance function
This setting has been considered by Sguera et al. (2014).
Data 2. Class 0: and class 1: . A similar setting has been considered by Cuevas et al. (2007).
Data 3. Class 0: and class 1: , where is generated from and is generated from . LópezPintado and Romo (2009) considered a similar setting for outlier detection.
In the top panel of Figure 3, we provide one realization of two classes of curves for each setting. The functions are evaluated at 50 equidistant points on , i.e. . We independently generated 200 samples from both classes of each data setting, randomly chose 100 of them as the training set, and treated the remaining 100 samples as the testing set. We applied the six methods to the generated data and calculated the correct classification rate, , for each method. We repeated the above procedure 100 times. The results are presented in the bottom panel of Figure 3. Under all three settings, our proposed methods expectedly performed significantly better than the four existing classification methods. For example, the classification result from our methods are almost perfect, whereas the other four methods achieve less than in the second setting. This is because our two proposed methods describe the shape variation of a curve more effectively than does conventional functional depth.
3.3 Multivariate Functional Data
Typically, multivariate functional data are obtained from two sources: combining raw univariate curves and their derivatives (Cuevas et al., 2007; Claeskens et al., 2014) or functional data with multiple responses (Hubert et al., 2015a, b). We conduct simulation studies on both sources.
In the first scenario, we combine mean functions and the firstorder derivatives of Data 1, 2, and 3 to get bivariate functional data. Under the same setting for sample sizes, design points, and repeated times, we apply the six methods to the resulting data and present the classification results in Figure 4. On the three datasets, RMD and FOM perform better than the existing methods and FOM always performs the best. The performance of the existing methods improves by combining the firstorder derivatives with the mean function for classification. This is because the derivatives are no longer of the same scale for different groups, which makes classifying by conventional functional depths easier.
In the second scenario, we consider three settings: two bivariate cases and one threevariate case. Again, the two classes of simulated data possess the same range but different patterns. Data 4. Class 0: with and class 1: with , where is a bivariate Gaussian process with zero mean and covariance function (Gneiting et al., 2010; Apanasovich et al., 2012):
where is the correlation between and (), , is the marginal variance and with is the Matérn class (Matérn, 1960) where is a modified Bessel function of the second kind of order , is a smoothness parameter, and is a range parameter. Here, we set , , , , , and .
Data 5. Class 0: with and class 1: with , where are generated independently from a uniform distribution, , and are generated independently from a uniform distribution, ; and are generated independently from a uniform distribution, .
Data 6. Class 0: with three components generated from class 0 of Data 1, 2, and 3. Class 1: with three components generated from class 1 of Data 1, 2, and 3. Data 6 is a threevariate setting.
Realizations of two classes of curves for each setting are illustrated in the top panel of Figure 5. The functions are evaluated at 50 equidistant points from , i.e. . We independently generated 200 samples from both classes of each data setting, randomly chose 100 of them as the training set, and treated the remaining 100 samples as the testing set. We applied the six methods to the simulated data and calculated the correct classification rate for each method. We repeated the above procedure 100 times and present the results in the bottom panel of Figure 5. As illustrated, our proposed methods attain much higher than do the existing methods. In particular, VOM has almost perfect classification results for the three settings. Sometimes the four existing methods provide results that are slightly better than results from completely random classification. Data 5 is an example. These simulation results again validate our claim that the proposed methods based on directional outlyingness are much more effective in distinguishing curve groups that vary by shape.
Besides the noncontaminated settings, we also consider a contaminated setting as follows:
Data 1C. Class 0: and class 1: , where is an indicator function: equals to if and otherwise; is generated from . Class 0 is contaminated by outliers with a probability of . Sguera et al. (2014) considered a similar setting.
The functions are evaluated at 50 equidistant points on , i.e. .
We independently generated 200 samples from both classes, randomly chose 100 of them as the training set, and treated the remaining 100 samples as the testing set.
We calculated the correct detection rates of the six methods based on the mean curves and the combination of the mean curves and their firstorder derivatives, respectively.
The results as illustrated in Figure 6 are quite similar with the results from Data 1, which means our proposed methods are robust to the presence of outliers.
4 Data Applications
In this section, we evaluate our methods on two real datasets: the first one is univariate and the second one is multivariate. Comparisons with existing methods are provided as well.
4.1 Phoneme Data
We first apply our methods to the benchmark phoneme dataset. Phoneme is a speechrecognition problem introduced by Hastie et al. (1995). We obtain the data from the R package fds. The dataset comprises five phonemes extracted from the TIMIT database (TIMIT AcousticPhonetic Continuous Speech Corpus, NTIS, U.S. Department of Commerce). The phonemes are transcribed as follows: “sh” as in “she”, “dcl” as in “dark”, “iy” as the vowel in “she”, “aa” as the vowel in “dark”, and “ao” as the first vowel in “water”. A logperiodogram was computed from each speech frame; this is one of several widely used methods for translating speech data into a form suitable for speech recognition. For each logperiodogram, we consider the first 150 frequencies. In our study, we randomly select 400 samples for each class and consequently, 2000 samples are considered in total. Ten samples from each class are illustrated in Figure 7. As shown, the five types of curves vary within the same range with different shapes.
We randomly select 1500 samples as the training set (300 for each class) and treat the remaining 500 samples as the testing set (100 for each class). We apply the six aforementioned methods in two ways: 1) using only the raw data (univariate); and 2) using both raw data and their firstorder derivatives (bivariate). For each method, we calculate the correct classification rate and repeat this procedure 50 times. The simulation results are presented in Figure 8. Based on the raw data, our methods perform better than the existing methods. After taking their first derivatives into consideration, the performance of all methods except for RMD is improved significantly and VOM achieves the highest correct classification rate.
4.2 Gesture Data
Gesture commands are widely used to interact with or control external devices, e.g., playing gesturebased games and controlling interactive screens. The problem is how to recognize one observation accurately as a particular gesture. Our second dataset includes gesture data comprising the eight simple gestures shown in Figure 9. These gestures have been identified by a Nokia research study as preferred by users for interaction with home appliances.
We downloaded this dataset from Chen et al. (2015). It includes 4,480 gestures: 560 for each type of action made by eight participants ten times per day during one week. Each record contains accelerations on three orthogonal directions (, and ), which means we need to classify threedimensional curves. We find the median curve of acceleration for three directions of each gesture with the functional boxplot (Sun and Genton, 2011) as shown in Figure 10. Generally, most of the acceleration curves oscillate between and . We apply the six methods to the gesture data in four ways: combining all three components together, , and selecting two components out of three, , , and . For each numerical study, we randomly select 3200 samples as the training set (400 for each class) and treat the remaining 1280 samples as the testing set (160 for each class). We collect 50 correct classification rates for each method and present them in Figure 11.
In the four combinations, our proposed methods are always better than the four existing methods except for RMD of . For three cases, VOM achieves the best performance among the six methods. Overall, the correct classification rates improve as we raise the dimensions of the curves. We define the marginal effect of component as the averaged difference between for and . This quantity measures how informative a component is for a classification task. By comparing the plot of with the other three cases, we find that the marginal effect of is the smallest. This finding is consistent with the fact that the acceleration curves in direction are more alike with each other. For example, the black and yellow curves in the middle graph of Figure 10 are quite similar to the purple and red curves, respectively. In contrast, the shapes of the acceleration curves in the other two directions differ, which leads to their higher marginal effects. The gestures included in the dataset were mainly collected from the screens of smart phones, which means that the direction orthogonal to the screen is not as informative as the other two directions.
5 Conclusion
In this paper, we introduced two classifiers for multivariate functional data based on directional outlyingness and the outlyingness matrix. Unlike pointtype data that can be classified only by their locations, functional data often differ not by their magnitudes but by their shapes. This feature challenges the classifiers based on conventional functional depth because they cannot effectively describe the shape variation between curves. Directional outlyingness tackles this challenge by combining the direction of outlyingness with conventional functional depth: it measures shape variation not only by the change in the level of outlyingness but also by the rotation of the direction of outlyingness. For multivariate cases, we defined the outlyingness matrix and investigated theoretical results for this matrix. On both univariate and multivariate functional data, we evaluated our proposed classifiers and obtained better results than existing methods using both simulated and real data.
The proposed methods can be simply generalized to image or video data (Genton et al., 2014), where the support of functional data is twodimensional. We plan to investigate more general settings for both classifiers and data structures. Rather than the constant weight function considered in the current paper, we believe that a weight function proportional to local variation could further improve our methods. It is reasonable to put more weight on the time points where the curves differ a lot and less weight on those where the curves are quite alike. For functional data observed at irregular or sparse time points (LópezPintado and Wei, 2011), we may fit the trajectories with a set of basis functions and then estimate depth of the discrete curves based on their continuous estimates. The functional data within each group could be correlated in general data structures. An example is spatiotemporal precipitation (Sun and Genton, 2012). Our methods need further modifications to account for the correlations between functional observations as well.
Appendix
Proofs of Theorem 1.
(i): Similarly, we decompose the total depth matrix into two parts: scale depth matrix and shape depth matrix:
(ii) The result is straightforward using for any vector .
Proof of Theorem 2.
To prove Theorem 2, we first prove the following results for directional outlyingness of pointwise data:
Proof: Since is a valid depth possessing the four popular properties stated in Definition 2.1 of Zuo and Serfling (2000), the affine invariance of indicates that . Consequently, we have
(2) 
For the directional part, , we have
(3) 
In the final step, we use since is an orthogonal matrix. Then, based on (2) and (3), we get
(4) 
Since is a onetoone transformation on the interval , it is easy to show
By (4) and , holds. Then, we have
Consequently, we have . Similarly,
This leads to . Finally,
This completes the proof.
Footnotes

CEMSE Division,
King Abdullah University of Science and Technology,
Thuwal 239556900, Saudi Arabia. Email: wenlin.dai@kaust.edu.sa, marc.genton@kaust.edu.sa
This research was supported by the King Abdullah University of Science and Technology (KAUST).
References
 Alonso, A. M., Casado, D., and Romo, J. (2012), “Supervised classification for functional data: A weighted distance approach,” Computational Statistics & Data Analysis, 56, 2334–2346.
 Apanasovich, T. V., Genton, M. G., and Sun, Y. (2012), “A valid Matérn class of crosscovariance functions for multivariate random fields with any number of components,” Journal of the American Statistical Association, 107, 180–193.
 Boser, B. E., Guyon, I. M., and Vapnik, V. N. (1992), “A training algorithm for optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, pp. 144–152.
 Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., and Batista, G. (2015), “The UCR Time Series Classification Archive,” .
 Claeskens, G., Hubert, M., Slaets, L., and Vakili, K. (2014), “Multivariate functional halfspace depth,” Journal of the American Statistical Association, 109, 411–423.
 Cortes, C. and Vapnik, V. (1995), “Supportvector networks,” Machine Learning, 20, 273–297.
 CuestaAlbertos, J. A., FebreroBande, M., and de la Fuente, M. O. (2015), “The classifier in the functional setting,” arXiv preprint arXiv:1501.00372.
 Cuevas, A., Febrero, M., and Fraiman, R. (2007), “Robust estimation and classification for functional data via projectionbased depth notions,” Computational Statistics, 22, 481–496.
 Dai, W. and Genton, M. G. (2017), “Directional outlyingness for multivariate functional data,” arXiv preprint arXiv:1612.04615v2.
 Delaigle, A. and Hall, P. (2012), “Achieving near perfect classification for functional data,” Journal of the Royal Statistical Society: Series B, 74, 267–286.
 Delaigle, A., Hall, P., and Bathia, N. (2012), “Componentwise classification and clustering of functional data,” Biometrika, 99, 299–313.
 Epifanio, I. (2008), “Shape descriptors for classification of functional data,” Technometrics, 50, 284–294.
 Ferraty, F. and Vieu, P. (2006), Nonparametric Functional Data Analysis: Theory and Practice, Springer.
 Fraiman, R. and Muniz, G. (2001), “Trimmed means for functional data,” TEST, 10, 419–440.
 Galeano, P., Joseph, E., and Lillo, R. E. (2015), “The Mahalanobis distance for functional data with applications to classification,” Technometrics, 57, 281–291.
 Genton, M. G. and Hall, P. (2016), “A tilting approach to ranking influence,” Journal of the Royal Statistical Society: Series B, 78, 77–97.
 Genton, M. G., Johnson, C., Potter, K., Stenchikov, G., and Sun, Y. (2014), “Surface boxplots,” Stat, 3, 1–11.
 Gneiting, T., Kleiber, W., and Schlather, M. (2010), “Matérn crosscovariance functions for multivariate random fields,” Journal of the American Statistical Association, 105, 1167–1177.
 Hastie, T., Buja, A., and Tibshirani, R. (1995), “Penalized discriminant analysis,” The Annals of Statistics, 73–102.
 Horváth, L. and Kokoszka, P. (2012), Inference for Functional Data with Applications, Springer.
 Hubert, M., Rousseeuw, P. J., and Segaert, P. (2015a), “Multivariate and functional classification using depth and distance,” arXiv preprint arXiv:1504.01128.
 — (2015b), “Multivariate functional outlier detection,” Statistical Methods & Applications, 24, 177–202.
 James, G. M. and Hastie, T. J. (2001), “Functional linear discriminant analysis for irregularly sampled curves,” Journal of the Royal Statistical Society: Series B, 63, 533–550.
 Kuhnt, S. and Rehage, A. (2016), ‘‘An anglebased multivariate functional pseudodepth for shape outlier detection,” Journal of Multivariate Analysis, 146, 325–340.
 Li, B. and Yu, Q. (2008), “Classification of functional data: A segmentation approach,” Computational Statistics & Data Analysis, 52, 4790–4800.
 Li, J., CuestaAlbertos, J. A., and Liu, R. Y. (2012), “DDclassifier: Nonparametric classification procedure based on DDplot,” Journal of the American Statistical Association, 107, 737–753.
 Liu, R. Y., Parelius, J. M., Singh, K., et al. (1999), “Multivariate analysis by data depth: descriptive statistics, graphics and inference,” The Annals of Statistics, 27, 783–858.
 LópezPintado, S. and Romo, J. (2006), “Depthbased classification for functional data,” DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications., 72, 103–120.
 — (2009), “On the concept of depth for functional data,” Journal of the American Statistical Association, 104, 718–734.
 — (2011), “A halfregion depth for functional data,” Computational Statistics & Data Analysis, 55, 1679–1695.
 LópezPintado, S., Sun, Y., Lin, J. K., and Genton, M. G. (2014), “Simplicial band depth for multivariate functional data,” Advances in Data Analysis and Classification, 8, 321–338.
 LópezPintado, S. and Wei, Y. (2011), “Depth for sparse functional data,” in Ferraty F. (ed) Recent Advances in Functional Data Analysis and Related Topics, Springer, pp. 209–212.
 Mahalanobis, P. C. (1936), “On the generalized distance in statistics,” Proceedings of the National Institute of Sciences of India, 2, 49–55.
 Matérn, B. (1960), Spatial Variation, Springer.
 Mosler, K. and Mozharovskyi, P. (2015), “Fast DDclassification of functional data,” Statistical Papers, 1–35. doi:10.1007/s00362–015–0738–3.
 Narisetty, N. N. and Nair, V. N. (2016), “Extremal Depth for Functional Data and Applications,” Journal of the American Statistical Association, 111, 1705–1714.
 Ramsay, J. O. and Silverman, B. W. (2006), Functional Data Analysis, Springer.
 Rossi, F. and Villa, N. (2006), “Support vector machine for functional data classification,” Neurocomputing, 69, 730–742.
 Rousseeuw, P. J. (1985), “Multivariate estimation with high breakdown point,” in Mathematical Statistics and Applications, Volume B (W. Grossmann, G. Pflug, I. Vincze and W. Wert, eds.), Reidel, Dordrecht, pp. 283–297.
 Sguera, C., Galeano, P., and Lillo, R. (2014), “Spatial depthbased classification for functional data,” TEST, 23, 725–750.
 Sun, Y. and Genton, M. G. (2011), “Functional boxplots,” Journal of Computational and Graphical Statistics, 20, 316–334.
 — (2012), “Adjusted functional boxplots for spatiotemporal data visualization and outlier detection,” Environmetrics, 23, 54–64.
 Tukey, J. W. (1975), “Mathematics and the picturing of data,” in Proceedings of the International Congress of Mathematicians, vol. 2, pp. 523–531.
 Zuo, Y. (2003), “Projectionbased depth functions and associated medians,” The Annals of Statistics, 31, 1460–1490.
 Zuo, Y. and Serfling, R. (2000), “General notions of statistical depth function,” The Annals of Statistics, 28, 461–482.