Intrinsic dimension and its application to association rules
The curse of dimensionality in the realm of association rules is twofold. Firstly, we have the well known exponential increase in computational complexity with increasing item set size. Secondly, there is a related curse concerned with the distribution of (spare) data itself in high dimension. The former problem is often coped with by projection, i.e., feature selection, whereas the best known strategy for the latter is avoidance. This work summarizes the first attempt to provide a computationally feasible method for measuring the extent of dimension curse present in a data set with respect to a particular class machine of learning procedures. This recent development enables the application of various other methods from geometric analysis to be investigated and applied in machine learning procedures in the presence of high dimension.
Keywords:Association Rules, Geometric Analysis, Curse of Dimensionality
The curse of dimensionality in machine learning is a well known common place to flag the frontier to difficulty. However, in fact there are at least two peculiarities of this, i.e., the combinatorical explosion in high dimension and the (often) complicated data distribution in high dimension. Even though both are connected to some extent the latter is the object of investigation of this work, which we call from now on dimension curse. This effect is closely related to the mathematical phenomenon of concentration of measure, which was discovered by V. Milman [Milman1983] and is also known as the Lévy property. There are various works linking both worlds with the most comprehensive being [Pestov1, Pestov2] by V. Pestov. His axiomatic approach led to a potent definition of intrinsic dimension, which is, however, computationally infeasible. Building up on his ideas we presented in [sbomb] an applicable setup, which we summarize in the following. For this we recall crucial notions from [sbomb] and show how data sets may be analyzed for dimension curse. We conclude our work with an exemplary application for association rules.
2 Observable Diameters and Dimension Function
Our approach for measuring the dimension curse for association rules is based on methods from geometric analysis. Hence, we need some mathematical structure that is accessible from both sides, geometrical methods as well as data representation. For this we developed the following creftypecap 2.1 in [sbomb]. But first, let us briefly recall some necessary basic mathematical notions. We call a topological space polish if is separable and there is a complete metric generating the topology of . A set of functions is called (pointwise) equicontinuous if there exists a neighborhood of in with for all . We utilize frequently the push-forward measure idea. Let be measurable spaces where is a measure on and is a measurable map. The push-forward measure of is defined by for every measurable set . On a more technical note, for a measurable space with some measure and some measurable we denote by the measure on the induced measure space defined by for every measurable . Finally, for a finite non-empty set we denote by by the normalized counting measure on .
Definition 2.1 (Data Structure [sbomb]).
A data structure is a triple consisting of a Polish space together with a Borel probability measure on and an equicontinuous set of real-valued functions on , where the elements of will be referred to as the features of . We call the trivial data structure. Given a data structure , we let
We call two data structures isomorphic and write if there exists a homeomorphism such that and .
In [sbomb] we elaborated on this definition. In particular, we introduced a pseudo metric on the collection of all data structures, a variant of Gromov’s observable distance [Gromov99, Chapter 3.H]. We utilize this as a tool for analyzing high dimensional data, as proposed by Pestov [Pestov1, Pestov2]. However, we will refrain to introduce the specifics of this pseudo metric and refer to it informally for the rest of this work.
The goal now is to find a mathematical sound dimension function for the collection of all data structures. We may skip the necessary propositions and proceeding mathematical notions and state the two most important properties to expect from a dimension function informally. First, a dimension function should reflect the presence of the concentration phenomenon in a data structure. More precisely, a dimension function shall diverge on a sequence of data structures if and only if this sequence has the Lévy property. Second, if a sequence of data structures concentrates (w.r.t. the pseudo metric) to a particular data structure, the dimension function shall concentrate to its value on this data structure as well. For the further complete axiomatization we refer the reader to [sbomb].
For reasons of space, we may not address the various technical mathematical challenges, preparations and connections to geometric analysis. We rather jump to the main result of [sbomb], a new quantity for expressing the extent to which a data structure is prone to the dimension curse: the intrinsic dimension. For this we adapt further the ideas from [Gromov99, Chapter 3] about observable diameters.
Definition 2.2 (Observable Diameter [sbomb]).
We showed in [sbomb] that the observable diameter is invariant under isomorphisms of data structures and it fulfills a continuity property with respect to the earlier mentioned pseudo metric. Building up on this definition we can state: The map defined by Equation 3 is a dimension function.
3 Example Experiment and Applications
Distance functions, as often used in machine learning procedures, are a natural candidate for feature functions. Hence, we might not need to motivate the applicability of the intrinsic dimension for those. However, the idea of dimension function in mathematical data structures is able to cope with any kind of proper feature function set. Therefore we decided for an exemplary application in association rule mining. A possible adaption of data structures and observable diameter could be done as follows: We consider a set – we restrict our example to non repeating transactions – of transactions with transactions where is called itemset. An assocation rule on then is an element such that and . We denote by the set of all association rules for , and by the subset such that . To convert this data into a mathematical data structure like introduced in creftypecap 2.1 we take the following approach: with . Hence, we consider the transactions as data points and the feature functions are mappings from those data points to the support of a head of a rule, i.e., the relative amount of items covered by this particular rule. Using this setup the if and otherwise, for all rules . Hence, this yields . We plotted in Figure 1 multiple example calculations for well known association rule minining data sets, i.e., accident [accidentdata], mushroom and chess [UCI]. We observe an increase in dimension with the increase of support.
This is expected due to the antitone character of feature sets. However, the slope differs among the different data sets and confidence values, revealing the ability of the particular feature sets to cover the data.
We presented in this work an mathematical approach for measuring the
dimension curse in machine learning. The novelty here is the
computationally feasible character. Besides the indicated application for association rules
there are various applications possible, and from the standpoint
of understanding the dimension curse, necessary. One particular
crucial application could be assessing the
results of dimension reduction procedures like Principle Component Analysis.
F.M.S. acknowledges funding of the Excellence Initiative by the German Federal and State Governments, as well as the Brazilian CNPq, processo 150929/2017-0.