FIBS: A Generic Framework for Classifying Interval-based Temporal Sequences Supported by grants 514906-17 from ISM Canada and the Natural Sciences and Engineering Research Council of Canada.

FIBS: A Generic Framework for Classifying Interval-based Temporal Sequences 1

Abstract

We study the problem of classification of interval-based temporal sequences (IBTSs). Since common classification algorithms cannot be directly applied to IBTSs, the main challenge is to define a set of features that effectively represents the data such that learning classifiers are able to perform. Most prior work utilizes frequent pattern mining to define a feature set based on discovered patterns. However, frequent pattern mining is computationally expensive and often discovers many irrelevant patterns. To address this shortcoming, we propose the FIBS framework for classifying IBTSs. FIBS extracts features relevant to classification from IBTSs based on relative frequency and temporal relations. To avoid selecting irrelevant features, a filter-based selection strategy is incorporated into FIBS. Our empirical evaluation on five real-world datasets demonstrate the effectiveness of our methods in practice. The results provide evidence that FIBS framework effectively represents IBTSs for classification algorithms and it can even achieve better performance when the selection strategy is applied.

\counterwithin

tablesection\counterwithinfiguresection \patchcmd

1 Introduction

Interval-based temporal sequence (IBTS) data are collected from application domains in which events persist over intervals of time of varying lengths. Such domains include medicine [14, 13, 10, 5], sensor networks [8], sign languages [12], and motion capture [7]. Applications that need to deal with this type of data are common in industrial, commercial, government, and health sectors. For example, some companies offer multiple service packages to customers that persist over varying periods of time and may be held concurrently. The sequences of packages that a customer holds can be represented as an IBTS.

IBTSs can be obtained either directly from the applications or indirectly by data transformation. IBTSs can also be generated by transformation of time series data. In particular, temporal abstraction of multivariate (or univariate) time series may yield such data. Segmentation or aggregation of a time series into a succinct symbolic representation is called temporal abstraction (TA) [10]. TA transforms a numerical time series to a symbolic time series. This high-level qualitative form of data provides a description of the raw time series data that is suitable for a human decision-maker (beacause it helps them to understand the data better) or for data mining. TA may be based on knowledge-based abstraction performed by a domain expert. An alternative is data-driven abstraction utilizing temporal discretization. Common unsupervised discretization methods are Equal Width, Symbolic Aggregate Approximation (SAX) [6], and Persist [9]. Depending on the application scenario, symbolic time series may be categorized as point-based or as interval-based. Point-based data reflect scenarios in which events happen instantaneously or events are considered to have equal time intervals. Duration has no impact on extracting patterns for this type. Interval-based data, which is the focus of this study, reflect scenarios where events have unequal time intervals; here, duration plays an important role. Fig 1 depicts the process to obtain interval-based temporal sequences.

Figure 1: Process to obtain interval-based temporal sequences

Classifying IBTSs is a relatively new research area. Although data classification is an important machine learning task and has achieved great success in wide range of applications (fields), the classification of IBTSs has not been received much attention. A dataset of IBTSs contains longitudinal data where instances are described by a series of event intervals over time rather than features with a single value. Such dataset does not follow the proper format required by standard classification algorithms to build predictive models for IBTS.

Previous work in the area of IBTS classification utilized frequent pattern mining techniques to perform classification. Patel et al. [13] classified IBTS data in an unsupervised setting. They first mined all frequent temporal patterns and then used a selection of patterns as features for classification. They used Information Gain, a measure of discriminative power, as the selection criterion. After features were extracted, common classification techniques such as decision trees or Support Vector Machine (SVM) were used to predict the classifications for unseen data.

Several attempts have been made to perform classification in a supervised setting by employing frequent patterns as features [2, 11]. However, extracting features from frequent temporal patterns presents some challenges. Firstly, frequent pattern mining extracts too many frequent patterns, many of which are redundant or uninformative. Batal et al. [2] proposed a classification framework to address this problem by filtering out nonpredictive and spurious temporal patterns. Secondly, discovering frequent patterns is computationally expensive. Lastly, classification based on features extracted from frequent patterns does not always guarantee a better performance than the other methods. Bornemann et al. [3] proposed a feature-based classification framework, called STIFE, which extracts features using a combination of basic statistical metrics, shapelet [16] discovery and selection, and distance-based approaches. Then, a random forest was constructed using the extracted features to perform classification. This framework outperformed their previously proposed method.

In this paper, we formalize the problem of classification of IBTSs based on feature-based classifiers and propose a new framework to solve this problem. The major contributions of this work are as follows:

  • We propose a generic framework named FIBS for classifying IBTSs. It represents IBTSs by extracting features relevant to classification from IBTSs based on relative frequency and temporal relations.

  • To avoid selecting irrelevant features, we propose a heuristic filter-based feature selection strategy. FIBS utilizes this strategy to reduce the feature space and improve the classification accuracy.

  • We report on an experimental evaluation that shows the proposed framework is able to represent IBTSs effectively and efficiently.

The rest of the paper is organized as follows. Section 2 provides preliminaries and the problem statement. Section 3 presents the details of the FIBS framework and the feature selection strategy. Experimental results on real datasets and evaluation are given in Section 4. Section 5 presents conclusions.

2 Problem Statement

We adapt definitions given in earlier research [12] and describe the problem statement formally.

Definition 2.1.

(Event interval) Let denote a finite alphabet. A triple is called an event interval, where is the event label and, , , are the beginning and finishing time, respectively. We also use to denote element of event interval , e.g. is the beginning time of event interval . The duration of an event interval is .

Definition 2.2.

(Esequence) An event-interval sequence or Esequence is a list of event intervals placed in ascending order based on their beginning times. If event intervals have equal beginning times, then they are ordered lexicographically by their labels. Multiple occurrences of an event are allowed in an Esequence if they do not happen concurrently. The duration of an Esequence is , where is the finishing time of the last event interval and is the beginning time of the first event interval .

Definition 2.3.

(Esequence dataset) An Esequence dataset is a set of Esequences , where each Esequence is associated with an unique identifier .

Table 1 depicts an Esequence dataset consisting of four Esequences with identifiers 1 to 4.

id Event Label Beginning Time Finishing Time Event Sequence
1 8 28

18 21
24 28
25 27
2 1 14

6 14
8 11
8 11
3 6 22

6 14
14 20
16 18
4 4 24

5 10
5 12
16 22
18 20
Table 1: Example of an Esequence dataset

Problem Statement.

Given an Esequence dataset , where each Esequence is associated with a class label, the main challenge of a feature-based framework is to represent such that common feature-based classifiers are able to classify previously unseen Esequences similar to those in .

3 The FIBS Framework

In this section, we introduce the FIBS feature-based framework for classifying Esequence datasets (see Figure 2). FIBS represents an Esequence dataset based on relative frequency or temporal relation of event intervals. Then, any standard classifier will be able to perform on these representations.

{varwidth}10emEsequence Dataset

{varwidth}10em

Relative FrequencyRepresentationTemporal RelationRepresentation

{varwidth}10emClassifier

FeatureExtractionFeatureSelection

Figure 2: The FIBS framework

Classification algorithms often require data to be in a format reminiscent of a table, where rows represent instances (Esequences) and columns represent features (attributes). Since an Esequence dataset does not follow this format, we utilize FIBS to construct feature-based representations to enable standard classification algorithms to build predictive models.

A feature-based representation of a dataset has three components: a class label set, a feature set, and data instances. We first give a general definition of a feature-based representation based on these components [15].

Definition 3.1.

(Feature-based representation) A featured-based representation is defined as follows. Let be a set of class labels, be a set of features (or attributes), be a set of instances, and let denote the class label of instance .

An Esequence dataset contains longitudinal data where instances are described by a series of event intervals over time. In supervised settings, the class labels of the classes to which Esequences belong are known. Therefore, in order to form the feature-based representation, FIBS extracts the feature set and the instances from dataset . To define the and components, we consider two alternative formulations based on relative frequency and temporal relations among the events. These formulations are explained in the following subsections.

3.1 Relative Frequency

Definition 3.2.

(Relative frequency) The relative frequency of an event label in an Esequence , which is the duration-weighted frequency of the occurrences of in , is defined as the accumulated durations of all event intervals with event label in divided by the duration of . Formally:

(1)

Suppose that we want to specify a featured-based representation of an Esequence dataset using relative frequency. Let every unique event label found in be used as a feature, i.e., let . Also let every Esequence be used as the basis for defining an instance . The feature-values of instance are specified as a type containing the relative frequencies of every event label in . Formally, ,

Example 3.1.

Consider the feature-based representation that is constructed based on the relative frequency of the event labels in the Esequence dataset shown in Table 1. Let the class label set be and the feature set be {A, B, C, D, E, F}. Assume that the class label of each of , and is and the class label of is . Table 2 shows the resulting feature-based representation.

A B C D E F Class
1.00 0.15 0.20 0 0.10 0
1.00 0 0.62 0 0.23 0.23
1.00 0.50 0.38 0 0.13 0
1.00 0.25 0.30 0.35 0.10 0
Table 2: Feature-based representation constructed based on relative frequency

3.2 Temporal Relation

Thirteen possible temporal relations between pairs of intervals were nicely categorized by Allen [1]. Table 3 illustrates Allen’s temporal relations. Ignoring the “equals” relation, six of the relations are inverses of the other six. We emphasize seven temporal relations, namely, equals (q), before (b), meets (m), overlaps (o), contains (c), starts (s), and finished-by (f), which we call the primary temporal relations. Let set represents the thirteen temporal relation labels, where is the set of labels for the primary temporal relations and is the set of labels for the inverse temporal relations.

Temporal
Primary
Temporal
Relation Inverse
Example Pictorial
equals equals

before after

meets met-by

overlaps overlapped-by

contains during

starts startted-by

finished-by finishes

Table 3: Allen’s temporal relations between two event intervals

Exactly one of these relations holds between any ordered pair of event intervals. Some event labels may not occur in an Esequence and some may occur multiple times. For simplicity, we assume the first occurrence of an event label in an Esequence is more important than the remainder of its occurrences. Therefore, when extracting temporal relations from an Esequence, only the first occurrence is considered and the rest are ignored. With this assumption, there are at most possible pairs of event labels in a dataset.

Based on Definition 3.1, we now define a second feature-based representation that relies on the temporal relations. Let be the set of all 2-combinations of event labels from . The feature-values of instance are specified as a type containing the labels corresponding to temporal relation between every pair that occurs in an Esequence . In other words, , where an instance represents an Esequence .

Example 3.2.

Following Example 3.1, Table 4 shows the feature-based representation that is constructed based on the temporal relations between the first occurrences of the event labels in the Esequences given in 1. To increase readability 0 is used instead of to indicate that no temporal relation exists between the pair.

A,B A,C A,D A,E A,F B,C B,D B,E B,F C,D C,E C,F D,E D,F E,F Class
c f 0 c 0 b 0 b 0 0 c 0 0 0 0
0 f 0 c c 0 0 0 0 0 c c 0 0 q
c 0 c 0 b 0 b 0 0 c 0 0 0 0
c c c c 0 b s b 0 c 0 0 0
Table 4: Feature-based representation constructed based on temporal relations

3.3 Feature Selection

Feature selection for classification tasks aims to select a subset of features that are highly discriminative and thus contribute substantially to increasing the accuracy of the classification. Features with less discriminative power should not be selected since they either have little impact on the accuracy of the classification or may even harm it. As well, reducing the number of features improves the efficiency of many algorithms.

Based on their relevance to the targeted classes, John et al. [4] classify features into three disjoint categories, namely, strongly relevant, weakly relevant, and irrelevant features. Suppose and . Let be the probability distribution of class labels in given the values for the features in . The categories of feature relevance can be formalized as follows [17].

Definition 3.3.

(Strong relevance) A feature is strongly relevant iff

(2)
Definition 3.4.

(Weak relevance) A feature is weakly relevant iff

(3)
Corollary 3.1.

(Irrelevance) A feature is irrelevant iff

(4)

Strong relevance indicates that the feature is indispensable and it cannot be removed without loss of prediction accuracy. Weak relevance implies that the feature can sometimes contribute to prediction accuracy. Features are relevant if they are either strongly or weakly relevant and are irrelevant otherwise. Irrelevant features are dispensable and can never contribute to prediction accuracy.

Feature selection is especially beneficial to the temporal relation representation, when there are many event labels in the dataset. Although any feature selection method can be used to eliminate irrelevant features, some methods have advantages for particular representations. Filter-based selection methods are generally efficient because they assess the relevance of features by examining intrinsic properties of the data prior to applying any classification method. We propose a simple and efficient filter-based method to avoid producing irrelevant features for the temporal relation representation.

3.4 Filter-based Feature Selection Strategy

In this section, we propose a filter-based strategy for feature reduction that can also be used in unsupervised settings. We apply this strategy to avoid producing irrelevant features for the temporal relation representation.

Theorem 3.1.

An event label is an irrelevant feature of an Esequence dataset if its relative frequencies are equal in every Esequence in dataset .

Proof.

Suppose event label occurs with equal relative frequencies in every Esequence in dataset . We construct a feature-based representation based on the relative frequencies of the event labels as previously described. Therefore, there exists a feature that has the constant value of for all instances . We have . Therefore, . According to Corollary 3.1, we conclude is an irrelevant feature. ∎

We provide a definition for support that is applicable to relative frequency. If we add up the relative frequencies of event label in all Esequences of dataset and then normalize the sum, we obtain the support of in . Formally:

(5)

where is the number of Esequences in .

The support of an event can be used as the basis of dimensionality reduction during pre-processing for a classification task. One can identify and discard irrelevant features (event labels) based on their supports. We will now show how the support is used to avoid extracting irrelevant features by the following corollary, which is an immediate consequence of Theorem 3.1.

Corollary 3.2.

An event label whose support in dataset is 0 or 1 is an irrelevant feature.

Proof.

As with the proof of Theorem 3.1, assume we construct a feature-based representation based on the relative frequency of the event labels. If then, there exists a mapping feature that has equal relative frequencies (value) of for all instances . The same argument holds if . According to Theorem 3.1, we conclude is an irrelevant feature. ∎

In practice, situations where the support of a feature is exactly 0 or 1 do not often happen. Hence, we propose a heuristic strategy that discards probably irrelevant features based on a confidence interval defined with respect to an error threshold .

Heuristic Strategy:

If is not within a confidence interval , then event label is a probably irrelevant feature in and can be discarded.

3.5 Comparison to Representation Based on Frequent Patterns

In frequent pattern mining, the support of temporal pattern in a dataset is the number of instances that contain . A pattern is frequent if its support is no less than a predefined threshold set by user. Once frequent patterns are discovered, after computationally expensive operations, a subset of frequent patterns are selected as features. The representation contains binary values such that if a selected pattern occurs in an Esequence the value of the corresponding feature is 1, and 0 otherwise. Example 3.3 illustrates a limitation of classification of IBTSs based on frequent pattern mining where frequent patterns are irrelevant to the class labels.

Example 3.3.

Consider Table 1 and its feature-based representation constructed based on relative frequency, as shown in Example 3.1. In this example, the most frequent pattern is A, which has a support of 1. However, according to Corollary 3.2, A is an irrelevant feature and can be discarded for the purpose of classification. For this example, a better approach is to classify the Esequences based on the presence or absence of F such that the occurrence of F in an Esequence means the Esequence belongs to the class and the absence of F means it belongs to the class.

In practice, large number of frequent patterns affects the performance of the approach in both the pattern discovery step and the feature selection step. Obviously, mining patterns that are later found to be irrelevant, is useless and computationally costly.

4 Experiments

In our experiments, we evaluate the effectiveness of the FIBS framework on the task of classifying interval-based temporal sequences using the well-known random forest classification algorithm on five real world datasets. We evaluate performance of FIBS using classifiers implemented in R version 3.6.1. The FIBS framework was also implemented in R. All experiments were conducted on a laptop computer with an Intel Core i5 CPU 2.2GHz and 8GB memory. We obtain overall classification accuracy using 10-fold cross-validation. We also compare the results for FIBS against those for the STIFE [3] framework. In order to see the effect of the feature selection strategy, the FIBS framework was tested on two different settings. When a feature selection strategy is not used, the FIBS framework is referred to as FIBS baseline whereas when one is used, it is referred to as FIBS+. For FIBS+, the error threshold , as defined in subsection 3.4, was set to 0.05.

Dataset Esequences Event Intervals Classes
Blocks 210 8 1,207 8
Context 240 54 19,355 5
Hepatitis 498 63 53,692 2
Pioneer 160 92 8,949 3
Skating 530 41 23,202 6
Table 5: Statistical information about datasets
Dataset FIBS Baseline FIBS+ STIFE_RF
Blocks 100 100 100
Context 97.83 98.48 99.58
Hepatitis 84.20 84.43 82.13
Pioneer 100 100 98.12
Skating 96.73 98.51 96.98
Table 6: Mean classification accuracy for each framework
Dataset FIBS Baseline FIBS+
# Features Time_f (s) Time_c (s) # Features Time_f (s) Time_c (s)
Blocks 36 1.222 0.163 29 1.122 0.1503
Context 1485 137.2 5.460 244 30.83 1.324
Hepatitis 2016 660.8 23.13 804 416.5 9.104
Pioneer 4278 96.60 15.51 2793 59.86 7.556
Skating 861 85.75 9.55 161 40.38 1.952
Table 7: Comparison of number of features, framework execution time and mean classification time for random forest using the FIBS baseline and FIBS+ frameworks

4.1 Datasets

Five real-world datasets from various application domains were used to evaluate the FIBS framework. Statistics concerning these datasets are summarized in Table 5. The datasets are as follows:

  • Blocks [8]. Each event interval corresponds to visual primitives obtained from videos of a human hand stacking colored blocks and describe which blocks are touched as well as the actions of the hand (e.g., contacts blue, attached hand red, etc.). Each Esequence represents one of eight scenarios, such as assembling a tower.

  • Context [8]. Each event interval was derived from categorical and numeric data describing the context of a mobile device carried by a person in some situation (e.g., walking inside/outside, using elevator, etc). Each Esequence represents one of five different scenarios such as being on a street or at meeting.

  • Hepatitis [13]. Each event interval represents the result of one of 63 medical tests of patients who have either Hepatitis B or Hepatitis C over a period of 10 years. Each Esequence corresponds to a series of tests that a patient undergoes.

  • Pioneer [8]. Event intervals were derived from the Pioneer-1 dataset available in the UCI repository corresponding to the input provided by the robot sensors. Each Esequence in the dataset describes one of three scenarios: move, turn, or grip.

  • Skating [8]. Each event interval describes muscle activity and leg positions of one of six professional In-Line Speed Skaters during controlled tests at seven different speeds on a treadmill. Each Esequence represents a complete movement cycle of a skater.

4.2 Performance Evaluation

Table 6 shows the mean classification accuracy of the random forest classification algorithm using FIBS baseline, FIBS+ and STIFE on the datasets. We can observe that FIBS baseline has high accuracy across all datasets. FIBS performs even better when we apply the feature selection strategy demonstrated by FIBS+. Overall, FIBS+ outperforms the two other methods across all datasets except the Context dataset, which it loses to STIFE with a slight difference.

4.3 Effect of Feature Selection

Table 7 shows the results of the experiments for classifying the five datasets using the random forest algorithm. It reports the number of features (# Features) produced by the frameworks, the time taken when applying the frameworks to produce the two representations (features) (Time_f), and the mean classification time for execution of random forest (Time_c) in seconds. As shown in the table, FIBS+ reduces the number of features, and both framework execution and classification time in all datasets. It also improves the accuracy of the classifier, as is shown in Table 6. In summary, the above results suggests that FIBS+ is more efficient and effective than FIBS baseline.

5 Conclusion

No established standard exists for a feature-based framework for classification of interval-based temporal sequences (ITBSs). To date, most ITBS classification tasks have been performed by frameworks based on frequent pattern mining. As a faster alternative, we proposed a simple feature-based framework, called FIBS, for this classification task. FIBS incorporates two possible representations for features extracted from IBTSs, one based on the relative frequency of the occurrences of event labels and other based on the temporal relations among the event intervals. Due to the possibility of generating too many features when using the latter representation, which could reduce performance to an unacceptable level, we proposed a heuristic feature selection strategy based on the idea of the support for the event labels. The experimental results demonstrated that methods implemented in the FIBS framework can achieve high accuracy and reasonably fast performance on the task of classifying IBTSs. These results provide evidence that the FIBS framework effectively represents IBTS data for classification algorithms.

Footnotes

  1. thanks: Supported by grants 514906-17 from ISM Canada and the Natural Sciences and Engineering Research Council of Canada.
  2. footnotemark:

References

  1. J. F. Allen, Maintaining knowledge about temporal intervals, Communications of the ACM, 26 (1983), pp. 832–843.
  2. I. Batal, H. Valizadegan, G. F. Cooper, and M. Hauskrecht, A temporal pattern mining approach for classifying electronic health record data, ACM Transactions on Intelligent Systems and Technology (TIST), 4 (2013), p. 63.
  3. L. Bornemann, J. Lecerf, and P. Papapetrou, STIFE: A framework for feature-based classification of sequences of temporal intervals, in International Conference on Discovery Science, Springer, 2016, pp. 85–100.
  4. G. H. John, R. Kohavi, and K. Pfleger, Irrelevant features and the subset selection problem, in Machine Learning Proceedings 1994, Elsevier, 1994, pp. 121–129.
  5. R. Kosara and S. Miksch, Visualizing complex notions of time, Studies in Health Technology and Informatics, (2001), pp. 211–215.
  6. J. Lin, E. Keogh, L. Wei, and S. Lonardi, Experiencing SAX: a novel symbolic representation of time series, Data Mining and Knowledge Discovery, 15 (2007), pp. 107–144.
  7. Y. Liu, L. Nie, L. Liu, and D. S. Rosenblum, From action to activity: sensor-based activity recognition, Neurocomputing, 181 (2016), pp. 108–115.
  8. F. Mörchen and D. Fradkin, Robust mining of time intervals with semi-interval partial order patterns, in Proceedings of the 2010 SIAM International Conference on Data Mining, SIAM, 2010, pp. 315–326.
  9. F. Mörchen and A. Ultsch, Optimizing time series discretization for knowledge discovery, in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, 2005, pp. 660–665.
  10. R. Moskovitch and Y. Shahar, Medical temporal-knowledge discovery via temporal abstraction, in AMIA Annual Symposium Proceedings, vol. 2009, American Medical Informatics Association, 2009, p. 452.
  11.  , Classification-driven temporal discretization of multivariate time series, Data Mining and Knowledge Discovery, 29 (2015), pp. 871–913.
  12. P. Papapetrou, G. Kollios, S. Sclaroff, and D. Gunopulos, Mining frequent arrangements of temporal intervals, Knowledge and Information Systems, 21 (2009), p. 133.
  13. D. Patel, W. Hsu, and M. L. Lee, Mining relationships among interval-based events for classification, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, New York, NY, USA, 2008, ACM, pp. 393–404.
  14. E. Sheetrit, N. Nissim, D. Klimov, and Y. Shahar, Temporal probabilistic profiles for sepsis prediction in the ICU, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2019, pp. 2961–2969.
  15. J. Tang, S. Alelyani, and H. Liu, Feature selection for classification: A review, Data classification: Algorithms and applications, (2014), p. 37.
  16. L. Ye and E. Keogh, Time series shapelets: a new primitive for data mining, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2009, pp. 947–956.
  17. L. Yu and H. Liu, Redundancy based feature selection for microarray data, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2004, pp. 737–742.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
402573
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description