1 Introduction
Abstract

With the ubiquity of sensors in the IoT era, statistical observations are becoming increasingly available in the form of massive (multivariate) time-series. Formulated as unsupervised anomaly detection tasks, an abundance of applications like aviation safety management, the health monitoring of complex infrastructures or fraud detection can now rely on such functional data, acquired and stored with an ever finer granularity. The concept of statistical depth, which reflects centrality of an arbitrary observation w.r.t. a statistical population may play a crucial role in this regard, anomalies corresponding to observations with ’small’ depth. Supported by sound theoretical and computational developments in the recent decades, it has proven to be extremely useful, in particular in functional spaces. However, most approaches documented in the literature consist in evaluating independently the centrality of each point forming the time series and consequently exhibit a certain insensitivity to possible shape changes. In this paper, we propose a novel notion of functional depth based on the area of the convex hull of sampled curves, capturing gradual departures from centrality, even beyond the envelope of the data, in a natural fashion. We discuss practical relevance of commonly imposed axioms on functional depths and investigate which of them are satisfied by the notion of depth we promote here. Estimation and computational issues are also addressed and various numerical experiments provide empirical evidence of the relevance of the approach proposed.

\aistatstitle

The Area of the Convex Hull of Sampled Curves:
a Robust Functional Statistical Depth Measure

\aistatsauthor

Guillaume Staerman &Pavlo Mozharovskyi &Stephan Clémençon

\aistatsaddress

LTCI, Télécom Paris, Institut Polytechnique de Paris

1 Introduction

Technological advances in data acquisition, management and warehousing (e.g. IoT, distributed platforms) enable massive data processing and are leading to a wide variety of new applications in the digitalized (service) industry. The need to design more and more automated systems fed by ever more informative streams of data manifests in many areas of human activity (e.g transportation, energy, health, commerce, finance/insurance). Monitoring the behavior/health of complex systems offers a broad spectrum of machine-learning implementation as classification or anomaly detection. With the increasing industrial digitalization, data are more and more often collected in quasi-real time and naturally take the form of temporal series or functions. The case of functional data is thus of crucial interest in practice, refer to e.g. Ramsay and Silverman (2002, 2005) for an excellent account of Functional Data Analysis (FDA in abbreviated form) and of its applications. A functional dataset is typically a set of curves partially observed at different time points which can be seen as (partially observed) realizations of a stochastic process . Hence, the first step of FDA generally consists in reconstruct the functional objects from these observations, by means of interpolation, smoothing or projection techniques. Whereas, with the design of many successful algorithms such as (deep) neural networks, SVM’s or boosting procedures, the practice of statistical learning has rapidly generalized these last few years, the automatic analysis of functional data to achieve complex (e.g. unsupervised) tasks such as anomaly detection is still a challenge, due to the huge variety of possible patterns that may carry the relevant information for discrimination purposes. It is far from straightforward to generalize directly methods originally introduced in the finite-dimensional case to the functional setup, unless preliminary filtering or dimensionality reduction techniques are used, see e.g Rossi and Villa (2006); Staerman et al. (2019). Such techniques essentially consist in projecting the observation, supposed to take their values in a certain Hilbert space, onto a subspace of finite dimensionality, generally defined by truncating their expansion in a Hilbertian basis of reference or by means of a flexible dictionary of functions/’atoms’. Next, one can apply any state-of-the-art algorithm tailored to the finite dimension case, based on the parsimonious representations thus obtained, cf Ferraty and Vieu (2006); Ramsay and Silverman (2002). The basis functions can be either chosen among a pre-selected dictionary (e.g. Fourier, wavelets, cosine packets, etc.) presumably capable of capturing the information carried by the curves or built empirically using Functional Principal Component Analysis, retaining the most informative part of the (Kahrunen-Loève) decomposition only, see Ramsay and Silverman (2005). Of course, the major drawback of such FDA approaches lies in the fact that they are dramatically conditioned by the finite-dimensional representation method chosen, the subsequent analysis of the data may be fully jeopardized if the latter makes disappear some patterns relevant for the task considered.

Originally introduced by J. Tukey to extend the notion of median/quantile to multivariate random variables (see Tukey (1975b)), a data depth is a function defined on the feature space and valued in used to determine the ’representativeness’ of a point with respect to a statistical population and that should fulfill a variety of desirable properties (ideally just like the quantile function in the univariate situation). Given a training set, a data depth function provides a score that measures the centrality of any element w.r.t. a dataset and then defines, notably, an ordering of all the elements of the dataset. In particular, it finds a very natural application in (unsupervised) anomaly detection, see e.g Cuevas et al. (2007); Long and Huang (2016); Mosler and Mozharovskyi (2017) in supervised situations and Hubert et al. (2015); Nagy et al. (2017) in the unsupervised context: an observation is considered all the more ’abnormal’, as its depth is small. This concept has been extended to the functional data framework by integrating univariate depth functions on the whole interval of time , see e.g. Claeskens et al. (2014); Fraiman and Muniz (2001); Hubert et al. (2015). Alternatively, various depth functions fully tailored to the functional setup have been introduced in the statistical literature, refer to e.g Chakraborty and Chaudhuri (2014); Dutta et al. (2011); Lopez-Pintado and Romo (2009, 2011). However, most of them fail to fulfill certain desirable properties or face significant computational difficulties. It is the major purpose of this paper to introduce a novel robust functional statistical depth measure dedicated to the analysis of functional data. Based on the area of the convex hull of collections of sampled curves, it is easy to compute and to interpret both at the same time. Given a curve lying in a certain functional space (e.g. the Kolmogorov space , the space of real valued continuous functions on ), the general idea is to quantify its contribution, on average, to the area of the convex hull (ACH in short) of random curves in with the same probability law. Precisely, this function, referred to as the ACH depth throughout the article, is defined by the ratio of the ACH of the sample to that of the sample augmented by the curve . We prove here that it fulfills various properties desirable for depth functions. In particular, given its form, it exhibits sensitivity (i.e. the depth score of new/test curves that are further and further away from the training set of curves decreases smoothly), which property, quite desirable intuitively, is actually not satisfied by most statistical (functional) depth documented in the literature. For instance, the statistical Tukey depth assigns a score of to any element lying outside the convex hull of the training data, see Tukey (1975a); Dutta et al. (2011). In addition, the statistical depth we promote here is robust to outliers: adding outliers to the training set of curves has little or no effect on the returned score and ordering on a test set. For this reason, this functional depth is very well suited for unsupervised anomaly detection. In the functional setup, this task is extremely challenging. Indeed, the richness of functional spaces leads to a huge diversity in the nature of possibly observed differences between curves. As dicussed in Hubert et al. (2015), three main types of anomaly can be distinguished: shift, magnitude or shape anomalies. Anomalies can be either isolated/transient or persistent depending of their duration ; some of them being more difficult to detect (shape anomalies). Since the functional statistical depth measure we propose is related to a whole batch of curves and do not reflect the individual properties of a single curve, it enables the detection of a wide variety of anomaly shapes, as illustrated by the numerical experiments displayed in Section  4.

The paper is organized as follows. In Section  2, basic concepts pertaining to the statistical depth theory, both for the multivariate framework and for the functional case, are briefly recalled for clarity’s sake. In Section  3 the functional statistical depth based on the area of the convex hull of a batch of curves is introduced at length and its theoretical properties are investigated, together with computational aspects. Section  4 presents numerical results in order to provide strong empirical evidence of the relevance of the novel depth function proposed, for the purpose of unsupervised functional anomaly detection especially. Eventually, concluding remarks are collected in Section  5.

2 Background and Preliminaries

For clarity, we start with recalling the concept of statistical depth in the multivariate and functional framework. We next list the desirable properties it should fulfill and also briefly review recent advances in this field. Here and throughout, the Dirac mass at any point is denoted by , the convex hull of any subset of by .

2.1 Data Depth in a Multivariate Space

By data depth, one usually means a nonparametric statistical function that determines the centrality of any element with respect to a statistical population. Given a dataset, a depth function provides a center-outward ordering of the data points. Since it permits to define a ranking of the (multivariate) observations and local averages derived from it, a data depth can be used for various tasks, including classification (Lange et al. (2014)), clustering (Jörnsten (2004)), anomaly detection (Serfling (2006)) or rank tests (Oja (1983)). In order to give a precise definition, some notations are needed. Let be a random variable, defined on a certain probability space , taking its values in with probability distribution . Denote by the set of all probability distributions on . A data depth is a function

measurable with respect to its first argument . It is interpreted as follows: the closer the quantity to , the deeper (i.e. the more ’central’) the observation with respect to the distribution is considered. As mentioned above, it naturally defines a predorder on the set . In particular, medians of the multivariate distribution corresponds to maximizers of the depth function and quantile regions are defined as depth sublevel sets. A crucial example is the half-space depth (also called location depth sometimes) introduced in the seminal contribution Tukey (1975a). It is defined as

for any and probability distribution on . As the distribution is generally unknown, a statistical version can be built from independent copies of the generic random vector by means of the plug-in principle, i.e. by replacing by an empirical counterpart , typically the raw empirical distribution (or a smooth/penalized version of the latter), yielding the empirical depth

(1)

Empirical medians and quantile regions are then naturally defined as medians and quantile regions of the empirical depth (1). Of course, the relevance of a depth function regarding the measurement of centrality in a multivariate space is guaranteed in the sole case where certain desirable properties are satisfied. We refer to Zuo and Serfling (2000) for an account of the statistical theory of multivariate data depth and many examples.

2.2 Statistical Functional Depth

In this paper, we consider the situation where the r.v. takes its values in a space of infinite dimension. Precisely, focus is on the case where the feature space is the vector space of real-valued continuous functions on :

Recall that, when equipped with the norm , is a separable Banach space. We denote by the set of all probability laws on and by the -dimensional marginal of the law of the stochastic process at time point .

Depths in a functional framework have been first considered in Fraiman and Muniz (2001), where it is proposed to define functional depths as simple integrals over time of a univariate depth function , namely . Due to the averaging effect, local changes for the curve only induce slight modifications of the depth value, which makes anomaly detection approaches based on such ‘poorly sensitive’ functional depths ill-suited in general. Recently, alternative functional depths have been introduced, see Lopez-Pintado and Romo (2009, 2011) for depths based on the geometry of the set of curves, Chakraborty and Chaudhuri (2014) for a notion of depth based on the distance or Dutta et al. (2011) for a functional version of the Tukey depth. As discussed in Nieto-Reyes and Battey (2016) and Gijbels and Nagy (2018), the axiomatic framework introduced in Zuo and Serfling (2000) for multivariate depth is no longer adapted to the richness of the topological structure of functional spaces. Indeed, the vast majority of the functional depths documented in the literature do not fulfill versions of the most natural and elementary properties required for a depth function in a multivariate setup, cf Gijbels and Nagy (2018). However, there is still no consensus about the set of desirable properties that a functional depth should satisfy, beyond the form of sensitivity mentioned above. Those that appear to be the most relevant in our opinion are listed below. By is meant the law of a functional r.v. taking its values in .

  • (Non-degeneracy) For all non atomic distribution in , we have

  • (Affine invariance) The depth is said to be (scalar) affine invariant if for any in and all , in , we have

  • (Maximality at the center) For any point-symmetric and non atomic distribution with as center of symmetry, we have

  • (Vanishing at ) For any non atomic distribution in ,

  • (Decreasing w.r.t. the deepest point) For any in and any non atomic distribution in s.t. we have

    for all and .

  • (Continuity in ) For any non atomic distribution , the function is continuous w.r.t. the norm.

  • ((Uniform-) continuity in ) For all in , the mapping is (uniformly-) continuous w.r.t. the Lévy-Prohorov metric.

Before introducing the ACH depth and investigating its properties, a few remarks are in order. Though it obviously appears as mandatory to make the other properties meaningful, non-degeneracy, is actually not fulfilled by all the functional depths proposed, see e.g Lopez-Pintado and Romo (2009, 2011); Dutta et al. (2011). The ’Maximality at center’ and ’Decreasing w.r.t. the deepest point’ properties permit to preserve the original center-outward ordering goal of data depth in the functional framework. Many definition of the concept of ”symmetry” in a functional space are detailed in the Supplementary material for the sake of place. The ’Continuity in ’ property extends a property fulfilled by cumulative distribution functions of multivariate continuous distributions. From a statistical perspective, the ’Continuity in ’ property is essential, insofar as must be replaced in practice by an estimator, cf Eq. (1), built from finite-dimensional observations, i.e. a finite number of sampled curves.

3 The Area of the Convex Hull of (Sampled) Curves

It is the purpose of this section to present at length the statistical depth function we propose for path-valued random variables. As shall be seen below, its definition is based on very simple geometrical ideas and various desirable properties can be easily checked from it. Statistical and computational issues are also discussed at length. By is meant the collection of all compact subsets of and denotes Lebesgue measure on the plane . Consider an i.i.d. sample drawn from in . The graph of any function in is denoted by

while we denote by the set

defined by a collection of functions in . We now give a precise definition of the statistical depth measure we propose for random variables valued in .

Definition 3.1

Let be a fixed integer. The ACH depth of degree is the function defined by: ,

where are i.i.d. r.v.’s drawn from . Its average version is defined by: ,

The choice of leads to various views of distribution , the average variant permitting to combine all of them (up to degree ). When , an unbiased statistical estimation of can be obtained by computing the symmetric -statistic of degree , see Lee (1990): ,

(2)

Considering the empirical average version given by

brings some ’stability’. However, the computational cost rapidly increasing with , small values of are preferred in practice. Moreover, as we illustrate in Section  4.1, J equal two already yields satisfactory results.

Approximation from sampled curves. In general, one does not observes the batch of continuous curves on the whole time interval but at discrete time points only, the number of time points and the time points themselves possibly varying depending on the curve considered. In such a case, the estimators above are computed from continuous curves reconstructed from the sampled curves available by means of interpolation procedures or approximation schemes based on appropriate basis. In practice, linear interpolation is used for this purpose with theoretical guarantees (refer to Theorem  3.2 below) facilitating significantly the computation of the empirical ACH depth, see subsection  4.3.

3.1 Main Properties of the ACH Depth

In this subsection, we study theoretical properties of the population version of the functional depths introduced above and next establish the consistency of their statistical versions. The following result reveals that, among the properties listed in the previous subsection, five are fulfilled by the (average) ACH depth function.

Proposition 3.1

For all , the depth function (respectively, ) fulfills the following properties: ’non-degeneracy’, ’affine invariance’, ’vanishing at infinity’, ’continuity in ’ and ’uniform continuity in ’. In addition, the following properties are not satisfied: ’maximality at center’ and ’decreasing w.r.t. the deepest point’.

Refer to the Appendix section for the technical proof. In a functional space, not satisfying maximality at center is not an issue. For instance, though the constant trajectory is a center of symmetry for the Brownian motion, it is clearly not representative of this distribution. In contrast, scalar-affine invariance is relevant, insofar as it allows z-normalization of the functional data and uniform continuity in is essential to derive the consistency of (respectively, of ) in norm, as stated below.

Theorem 3.1

Let and be independent copies of a generic r.v. with distribution . As , we have, with probability one,

and

3.2 On Statistical/Computational Issues

As mentioned above, only sampled curves are available in practice. Each random curve being observed at fixed time points (potentially different for each ) with , we denoted by the continuous curves reconstructed from the sampled curves , , by linear interpolation. From a practical perspective, one considers the estimator of given by the approximation of obtained when replacing the ’s by the ’s in (2). The (computationally feasible) estimator of is constructed in a similar manner. The result stated below shows that this approximation stage preserves almost-sure consistency in norm.

Theorem 3.2

Let . Suppose that, as ,

As , we have, with probability one,

and

Refer to the Appendix section for the technical proof. Given the batch of continuous and piecewise linear curves , although the computation cost of the area of their convex hull is of order with , that of the U-statistic (and a fortiori that of ) becomes very expensive as soon as is large. As pointed out in Lopez-Pintado and Romo (2009), even if the choice for statistics of this type, may lead to a computationally tractable procedure, while offering a reasonable representation of the distribution, varying permits to capture much more information in general. For this reason, we propose to compute an incomplete version of the -statistic using a basic Monte-Carlo approximation scheme with replications: rather than averaging over all subsets of with cardinality to compute , one averages over subsets drawn with replacement, forming an incomplete -statistic, see Enqvist (1978). The same approximation procedure can be applied (in a randomized manner) to each of the -statistics involved in the average , as described in the Supplementary Material.

4 Numerical Experiments

From a practical perspective, this section explores certain properties of the functional depth proposed using simulated data. It also describes its performance compared with the state-of-the-art methods on (real) benchmark datasets. As a first go, we focus on the impact of the choice of the tuning parameter , which rules the trade-off between approximation accuracy and computational burden and parameter . Precisely, it is investigated through the stability of the ranking induced by the corresponding depths. We next investigate the robustness of the ACH depth (ACHD in its abbreviated form), together with its ability to detect abnormal observations of various types. Finally, the ACH depth is benchmarked against alternative depths standing as natural competitors in the functional setup using real datasets. A simulation-based study of the variance of the ACH depth is postponed to the Supplementary Material.

Figure 1: Boxplots of the approximations of (top) and (bottom) over different size of . The black crosses correspond to the exact depth measure for each respectively.

For the sake of simplicity, the two same simulated datasets, represented in Figure  3, are used throughout the section. The dataset corresponds to sample path segments of the geometric Brownian motion with mean and variance , a stochastic process widely used in statistical modeling. The dataset consists of smooth curves given by , , where and are independently and uniformly distributed on , as proposed by Claeskens et al. (2014). Four curves have been incorporated to each dataset: a deep curve and three atypical curves (anomalies), with expected depth-induced ranking .

4.1 Choosing Tuning Parameters and

Figure 2: Rank-Rank plot for different values of J (2, 3 and 4). The first line represents the rank over the dataset (a) while the second line represents the dataset (b).

Parameter reflects the trade-off between statistical performance and computational time. In order to investigate its impact on the stability of the method, we compute depths of the deepest and most atypical curves ( and ) for dataset (b), taking . Figure 1 presents boxplots of the approximated ACHD (together with the exact values of ACHD) over repetitions. Note that, as expected, depth values grow with . The variance of the depth decreases taking sufficiently small values for and almost disappearing for , while decreasing pattern remains the same for different values of . For these reasons, we keep in what follows.

The choice of is less obvious, and clearly when describing an observation in a functional space a substantial part of information is lost anyway. Nevertheless, one observes that computational burden increases exponentially with and thus smaller values are preferable. Figure 2 shows the rank-rank plots of datasets (a) and (b) for small values of and indicates, that depth-induced ranking does not change much with . Thus, for saving computational time, we use value in all subsequent experiments.

Figure 3: Datasets (a) (left) and (b) (right) containing 100 paths with four selected observations. The colors are the same for the four selected observations of both datasets (a) and (b).

4.2 Robustness

0 5 10 15 25 30
ACHD Location 0 0.6 1.3 2.2 4.3 5.2
Isolated 0 0.3 1.3 0.9 1.6 2.4
Shape 0 0.9 2 2.6 4.2 4.7
FSDO Location 0 3.6 7.3 10 16 20
Isolated 0 0.8 3.6 3.2 7.2 9.4
Shape 0 1.6 2.9 4.2 6.6 7.4
FT Location 0 5.1 9.5 13 20 23
Isolated 0 0.7 2.7 2.7 5.9 7.2
Shape 0 1.7 2.9 4.3 6.6 7.7
FIF Location 0 7 8.2 7.3 7.3 8.9
Isolated 0 9.3 12 11 10 12
Shape 0 7.4 7.9 10 14 14
Table 1: Kendall’s tau distances between the rank returned with normal data () and contamined data (, over different portion of contamination with location, isolated and shape anomalies) for ACHD and three state-of-the-art methods. Bold numbers indicate best stability of the rank over the contaminated datasets.

Under robustness of a statistical estimator on understands its ability not to be “disturbed” by atypical observations. We explore robustness of ACHD in the following simulation study: between the original dataset and the same dataset contaminated with anomalies, we measure (averaged over random repetitions) Kendall’s distance of two depth-induced rankings and , respectively, of the original data:

In their overview work, Hubert et al. (2015) introduce taxonomy for atypical observations, focusing on Location, Isolated, and Shape anomalies. Here, we add Location anomalies to dataset (a) and Isolated and Shape anomalies to dataset (b); other types of anomalies for both datasets can be found in the Supplementary Material. The abnormal functions are constructed as follows. Location anomalies for dataset (a) are with drawn uniformly on . Isolated anomalies for dataset (b) are constructed by adding a peak at (drawn uniformly on ) of amplitude (drawn uniformly on ) such that and for any . Shape anomalies for dataset (b) are with drawn uniformly from . By varying the percentage of abnormal observations , we compare ACHD to several of the most know in the literature depth approaches: the functional Stahel-Donoho depth (FSDO) (Hubert et al., 2015) and the functional Tukey depth (FT) (Claeskens et al., 2014), and also to the functional isolation forest (FIF) algorithm (Staerman et al., 2019) which proves satisfactory anomaly detection; see Table 3. One can observe that ACHD consistently preserves depth-induced ranking despite inserted abnormal observation, even if their fraction reaches . FSDO behaves competitively giving slightly better results than ACHD for shape anomalies.

4.3 Applications to Anomaly Detection

Figure 4: Number of anomalies detected over a grid of parameters for three types of anomalies (location, isolated, and shape) for ACHD and three further state-of-the-art methods.

Further, we explore the ability of ACHD to detect atypical observations. For this, we conduct an experiment in settings similar to those in Section 4.2, while changing degree of abnormality gradually for (out of curves) in dataset (a). Thus, we alter in for location anomalies, in for isolated anomalies, and in for shape anomalies to amplify the ”spikes” of oscillations such that . (For an illustration of abnormal curves the reader is referred to the Supplementary Material.) Figure 4 illustrates number of anomalies detected by ACHD, FSDO, FT, and FIF for different parameters of abnormality. While it is difficult to find the general winner, ACHD behaves favorably in all the considered cases and clearly outperforms the two other depths when the data is contaminated with isolated anomalies.

We conclude this section with a real-world data benchmark based on three datasets: Octane (Esbensen, 2001), Wine (Larsen et al., 2006), and EOG (Chen et al., 2015). The Wine dataset consists of 397 measurements of proton nuclear magnetic resonance (NMR) spectra of 40 different wine samples, the Octane dataset are 39 near infrared (NIR) spectra of gasoline samples with 226 measurements, while the EOG dataset represents the electrical potential between electrodes placed at points close to the eyes with 1250 measurements. (Graphs of the three datasets can be found in the Supplementary Material.) As pointed out by Hubert et al. (2015), it is difficult to detect anomalies in the first two datasets, while they are easily seen during the human eye inspection. For the EOG dataset, we assign smaller of the two classes to be abnormal. To the existing state-of-the-art methods, we add here Isolation Forest (IF) (Liu et al., 2008) and the One-Class SVM (OC) (Schölkopf et al., 2001)—multivariate methods applied after a proper dimension reduction (to the dimension 10) using Functional Principal Component Analysis (FPCA).(Ramsay and Silverman, 2002). Portions of detected anomalies (by all the considered methods), indicated in Table 2, hint on very competitive performance of ACHD in the addressed benchmark.

ACHD FSDO FT FIF IF OC
Octane 1 0.5 0.33 1 0.5 0.5
Wine 1 0 0 1 0 1
EOG 0.73 0.55 0.48 0.43 0.63 0.6
Table 2: Portion of detected anomalies of benchmark methods for the Octane, Wine, and EOG datasets.

5 Conclusion

In this paper, we have introduced a novel functional depth function on the space of real valued continuous curves on that presents various advantages. Regarding interpretability first, the depth computed at a query curve in takes the form of an expected ratio, quantifying the relative increase of the area of the convex hull of i.i.d. random curves when adding to the batch. We have shown that this depth satisfies several desirable properties and have explained how to solve approximation issues, concerning the sampled character of observations in practice and scalability namely. Numerical experiments on both synthetic and real data have highlighted a number of crucial benefits: reduced variance of its statistical versions, robustness with respect to the choice of tuning parameters and to the presence of outliers in the training sample, capacity of detecting (possibly slight) anomalies of various types, surpassing competitors such as depths of integral type, for isolated anomalies in particular. The open-source implementation of the method, along with all reproducing scripts, can be accessed at https://github.com/Gstaerman/ACHD.

References

  • M. Arcones (2003) On the asymptotic accuracy of the bootstrap under arbitrary resampling size. Ann. Inst. Statist. Math. 55, pp. 563–583. Cited by: §A.3.
  • Chakraborty and Chaudhuri (2014) The spatial distribution in infinite dimensional spaces and related quantiles and depths. The annals of statistics . External Links: Link, Cited by: §1, §2.2.
  • Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista (2015) The ucr time series classification archive. Note: Cited by: §4.3.
  • G. Claeskens, M. Hubert, L. Slaets, and K. Vakili (2014) Multivariate functional halfspace depth. Journal of American Statistical Association 109 (505), pp. 411–423. External Links: Link, Cited by: §1, §4.2, §4.
  • A. Cuevas, M. Febrero, and R. Fraiman (2007) Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics 22 (3), pp. 481–496. External Links: Link, Cited by: §1.
  • R. M. Dudley (2002) Real analysis and probability. 2 edition, Cambridge Studies in Advanced Mathematics, Cambridge University Press. Cited by: §A.2, §A.2.
  • Dudley (1984) A course on empirical processes. In Ecole d’Eté de probabilités de Saint-Flour XII. Lecture Notes in Math, springer (Ed.), External Links: Document Cited by: §A.3.
  • Dutta, Ghosh, and Chaudhuri (2011) Some intriguing properties of tukey’s half-space depth. Bernouilli . External Links: Link, Cited by: §1, §2.2, §2.2.
  • E. Enqvist (1978) On sampling from sets of random variables with application to incomplete -statistics. Ph.D. Thesis, Lund UniversityLund University. Note: PhD Dissertation Cited by: §3.2.
  • K. Esbensen (2001) Multivariate data analysis-in practice. Camo Software. Cited by: §4.3.
  • F. Ferraty and P. Vieu (2006) Nonparametric functional data analysis. Springer-Verlag, New York. Cited by: §1.
  • R. Fraiman and G. Muniz (2001) Trimmed means for functional data. Test 10 (2), pp. 419–440. Cited by: §1, §2.2.
  • I. Gijbels and S. Nagy (2018) On a general definition of depth for functional data. Statistical Science . External Links: Link, Cited by: §2.2.
  • M. Hubert, P. J. Rousseeuw, and P. Segaert (2015) Multivariate functional outlier detection. Statistical Methods & Applications 24 (2), pp. 177–202. External Links: ISSN 1613-981X, Document, Link Cited by: §1, §4.2, §4.3.
  • R. Jörnsten (2004) Clustering and classification based on the l1 data depth. Journal of Multivariate Analysis 90 (1), pp. 67 – 89. Cited by: §2.1.
  • T. Lange, K. Mosler, and P. Mozharovskyi (2014) Fast nonparametric classification based on data depth. Statistical Papers 55 (1), pp. 49–69. External Links: ISSN 1613-9798, Document, Link Cited by: §2.1.
  • F. Larsen, F. Berg, and S. Engelsen (2006) An exploratory chemometric study of 1 h nmr spectra of table wine. Journal of Chemometrics 20, pp. 198 – 208. Cited by: §4.3.
  • A. J. Lee (1990) -Statistics: theory and practice. Marcel Dekker, Inc., New York. Cited by: §3.
  • F. T. Liu, K. M. Ting, and Z. Zhou (2008) Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, Vol. , pp. 413–422. External Links: Document, ISSN 1550-4786 Cited by: §4.3.
  • Long and Huang (2016) A study of functional depths. preprint . External Links: Link, Cited by: §1.
  • S. Lopez-Pintado and J. Romo (2009) On the concept of depth for functional data. Journal of the American Statistical Association . External Links: Link, Cited by: §1, §2.2, §2.2, §3.2.
  • S. Lopez-Pintado and J. Romo (2011) A half-region depth for functional data. Computational Statistics and Data Analysis . External Links: Link, Cited by: §1, §2.2, §2.2.
  • K. Mosler and P. Mozharovskyi (2017) Fast dd-classification of functional data. Statistical Papers 58 (4), pp. 1055–1089. External Links: ISSN 1613-9798, Document, Link Cited by: §1.
  • S. Nagy, I. Gijbels, and D. Hlubinka (2016) Weak convergence of discretely observed functional data with applications. Journal of Multivariate Analysis 146, pp. 46 – 62. Cited by: §A.4.
  • S. Nagy, I. Gijbels, and D. Hlubinka (2017) Depth-based recognition of shape outlying functions. Journal of Computational and Graphical Statistics 26 (4), pp. 883–893. Cited by: §1.
  • A. Nieto-Reyes and H. Battey (2016) A topologically valid definition of depth for functional data. Statistical Science . External Links: Link, Cited by: §2.2.
  • Oja (1983) Descriptive statistics for multivariate distributions.. Statistics and Probability Letters . External Links: Link, Cited by: §2.1.
  • J. O. Ramsay and B. W. Silverman (2002) Applied functional data analysis: methods and case studies. Springer-Verlag, New York. Cited by: §1, §4.3.
  • J. O. Ramsay and B. W. Silverman (2005) Functional data analysis. Springer-Verlag, New York. Cited by: §1.
  • F. Rossi and N. Villa (2006) Support vector machine for functional data classification. Neurocomputing 69 (7), pp. 730 – 742. External Links: ISSN 0925-2312 Cited by: §1.
  • R. Schneider and W. Weil (2008) Stochastic and integral geometry. Springer-Verlag, Berlin Heidelberg. Cited by: §A.1.3.
  • R. Schneider (2013) Convex bodies: the brunn-minkowski theory. Cambridge University Press, Cambridge. Cited by: §A.1.3.
  • B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson (2001) Estimating the support of a high-dimensional distribution. Neural Computation 13 (7), pp. 1443–1471. Cited by: §4.3.
  • R. Serfling (2006) Depth functions in nonparametric multivariate inference. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 72, pp. . Cited by: §2.1.
  • G. Staerman, P. Mozharovskyi, S. Clémençon, and F. d’Alché-Buc (2019) Functional isolation forest. In Proceedings of The 11th Asian Conference on Machine Learning, Cited by: §1, §4.2.
  • J. W. Tukey (1975a) Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, R.D. James (Ed.), Vol. 2, pp. 523–531. Cited by: §1, §2.1.
  • Tukey (1975b) Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians . External Links: Link, Cited by: §1.
  • B.Y. Zuo and R. Serfling (2000) General notions of statistical depth function. The Annals of Statistics 28 (2), pp. 461–482. Cited by: §2.1, §2.2.
\aistatstitle

Supplementary Material to the Article
The Area of the Convex Hull of Sampled Curves:
a Robust Functional Statistical Depth Measure

First, Section A collects technical proofs omitted in the body of the article. Then, Section B provides the exact algorithm for approximate computation of the proposed depth notion. Finally, Section C collects further experimental results mentioned in the article.

A Technical proofs

This part presents the proofs of Proposition 3.1, Theorems 3.1 and 3.2 as well as the counter examples for the non-satisfied properties. Most of the proofs are done for both and .

a.1 Proof of Proposition 3.1

a.1.1 Affine-invariance

Let , it is clear that

where . Following this, and by properties of Lebesgue measure, we have

The case of :
Now, we just take a counter example to prove that it is not true if belongs to , the case where is trivial. For the sake of simplicity, let and . If we take and a random variable with distribution such that and . Let be samples from and be continuous function . It is easy to see that since

Note that even if we set to avoid the fact that the convex hull of constant function have null Lebesgue measure, and remain different, see Fig. 5.

Figure 5: Plots of the functions used in the case of . The three red lines come from . The cyan curves correspond to and and blue curve to .

a.1.2 Vanishing at infinity

Let be fixed and a sequence of function such that tends to infinity when grows, for every we define :

As a continuous function on compact set, are bounded, then . The result follows from dominated convergence theorem since is bounded by 1.

a.1.3 Continuity in

Let , be fixed curves in with at least two different curves, i.e, there exists a and such that . If , we need that is not a constant function. From Lemma A.6.1, we know that the function

is continuous.

Let be the set of all compact set in and the set of all convex bodies (compact, convex set with non-empty interior). We equip both spaces with the Hausdorff distance. We know that :

and

are continuous with respect to the Hausdorff distance. See for example, Theorems 12.3.5 and 12.3.6 in Schneider and Weil (2008) for and Theorem 1.8.16 in Schneider (2013) for . Then : is continuous.
It is straightforward to show that

is continuous. Now, we just have to prove that

is continuous which is true by dominated convergence theorem. We conclude the proof with the continuity of the sum of continuous functions.

a.2 (Uniform-) continuity in

is a polish space and implies that the set of all probability measures on this space with the Lévy-Prohorov metric is still polish. By the portmanteau theorem (e.g., see Theorem 11.3.3 in Dudley (2002)) it follows that is equivalent to for respectively a measure and a sequence of measures on . It implies that

for every bounded continuous real function on .

Let be fixed natural number and define the following function

If we equip with the infinite norm defined by

following the same argument from the proof A.1.3, is bounded and continuous.
Now, let be fixed and be both sequences of measures on such that . we have :

The inequality term comes from subadditivity of the supremum and by triangle inequality. Therefore, it is straightforward to see that the set is equicontinuous and uniformly bounded (by one). Thus, the convergence of the right term of the inequality can be concluded by Dudley (2002) (Corollary 11.3.4).

a.3 Proof of the Theorem 3.1

Following Proposition 6.1.7 from Dudley (1984), if a class of function is the unit ball of the dual space of a separable Banach space and a measure on such that then we have . Here, is a separable Banach space and is defined by:

is included in a unit ball of the dual space of this space. Then we have

Now, from Corollary 3.5 in Arcones (2003), we have:

which implies that:

and we have (by positivity):

equivalent to :

As we have

by triangle inequality and sub-additivity of supremum. As the right side term converge to almost-surely, the left-side term converge too. Hence,

a.4 Proof of the Theorem 3.2

The result follows from the uniform continuity in and Theorem 3 in Nagy et al. (2016).

a.5 Counter-examples for the not-satisfied properties

a.5.1 Maximality at the center

We restrict ourselves for simplicity to and . Let be a distribution such that with

The distribution is clearly centrally and halfspace symmetric around but we have

Since

and

Figure 6: Plot of the functions used in the counter example of the maximality at the center property. (blue curve) and (cyan curve) correspond to the distribution and corresponds to the red curve.

a.5.2 Decreasing w.r.t. the deepest point

We restrict ourselves to and for the sake of comprehension but the example still works on every . Let be the distribution such that

It is clear from this distribution that and , if we write , . We define and . We have , and . If we compute the depth of and we have :

and

The result follows. Notice that the result remains true if is replaced by the function.

Figure 7: Plots of the functions used in the counter example of the decreasing property. The three red lines come from the distribution, the thicker red curve corresponds to the maximal depth. The cyan curve corresponds to and the blue curve to .

a.6 Technical requirements

Lemma A.6.1

Let , be fixed curves in . The function

is continuous if we equip with the Hausdorff distance .

Proof. Let and be fixed in