On feature selection and evaluation of transportation mode prediction strategies
Transportation modes prediction is a fundamental task for decision making in smart cities and traffic management systems. Traffic policies designed based on trajectory mining can save money and time for authorities and the public. It may reduce the fuel consumption and commute time and moreover, may provide more pleasant moments for residents and tourists. Since the number of features that may be used to predict a user transportation mode can be substantial, finding a subset of features that maximizes a performance measure is worth investigating. In this work, we explore wrapper and information retrieval methods to find the best subset of trajectory features. After finding the best classifier and the best feature subset, our results were compared with two related papers that applied deep learning methods and the results showed that our framework achieved better performance. Furthermore, two types of cross-validation approaches were investigated, and the performance results show that the random cross-validation method provides optimistic results.
keywords:Feature engineering; Trajectory classification; Trajectory Mining;Transportation Modes Prediction
Trajectory mining is a very hot topic since positioning devices are now used to track people, vehicles, vessels, natural phenomena, and animals. It has applications including but not limited to transportation mode detection (Zheng et al., 2010; Endo et al., 2016; Dabiri and Heaslip, 2018; Xiao, 2017; Etemad et al., 2018), fishing detection (de Souza et al., 2016), tourism (Feng et al., 2017), and animal behaviour analysis (Fossette et al., 2010). There are also a number of topics in this field that need to be investigated further such as high performance trajectory classification methods (Endo et al., 2016; Dabiri and Heaslip, 2018; Zheng et al., 2010; Xiao, 2017; Liu and Lee, 2017), accurate trajectory segmentation methods (Zheng et al., 2008; Soares Júnior et al., 2015, 2018), trajectory similarity and clustering (Kang et al., 2009), dealing with trajectory uncertainty (Hwang et al., 2018), active learning Soares Júnior et al. (2017), and semantic trajectories (Parent et al., 2013). These topics are highly correlated and solving one of them requires to some extent exploring the more than one. For example, to perform a trajectory classification, it is necessary to deal with noise and segmentation directly and the other topics mentioned above indirectly.
As one of the trajectory mining applications, transportation modes prediction is a fundamental task for decision making in smart cities and traffic management systems. Traffic policies designed based on trajectory mining can save money and time for authorities and the public. It may reduce the fuel consumption and commute time and moreover, may provide more pleasant moments for residents and tourists. Since a trajectory is a collection of geo-locations captured through the time, extracting features that show the behavior of a trajectory is of prime importance. The number of features that can be generated for trajectory data is significant. However, some of these features are more important than others for the transportation mode prediction task. Selecting the best subset of features not only save the processing time but also may increase the performance of the learning algorithm. The features selection problem and the trajectory classification task were selected as the focus of this research. The contributions of this work are listed below.
Using two feature selection approaches, we investigated the best subset of features for transportation modes prediction.
Finally, we investigate the differences between two methods of cross-validation used by the literature of transportation mode prediction. The results show that the random cross-validation method suggests optimistic results in comparison to user-oriented cross-validation.
2 Related works
Feature engineering is an essential part of building a learning algorithm. Some of the algorithms extract features using representation learning methods; On the other hand, some studies select a subset from the handcrafted features. Both methods have advantages such as learning faster, less storage space, performance improvement of learning, and generalized models building (Li et al., 2017). These two methods are different from two perspectives. First, extracting features generates new features while selecting features chooses a subset of existing features. Second, selecting features constructs more readable and interpretable models than extracting features (Li et al., 2017). This work focuses on the feature selection task.
Feature selection methods can be categorized into three general groups: filter methods, wrapper methods, and embedded methods (Guyon and Elisseeff, 2003). Filter methods are independent of the learning algorithm. They select features based on the nature of data regardless of the learning algorithm (Li et al., 2017). On the other hand, wrapper methods are based on a kind of search, such as sequential, best first, or branch and bound, to find the best subset that gives the highest score on a selected learning algorithm (Li et al., 2017). The embedded methods apply both filter and wrapper (Li et al., 2017) such as decision tree. Feature selection methods can be grouped based on the type of data as well. The feature selection methods that use the assumption of i.i.d.(Independent and identically distributed) are conventional feature selection methods (Li et al., 2017) such as He et al. (2005) and Zhao and Liu (2007). They are not designed to handle heterogeneous or auto-correlated data. Some feature selection methods have been introduced to handle heterogeneous data and stream data that most of them working on graph structure such as Gu and Han (2011). Conventional feature selection methods are categorized in four groups: similarity-based methods like He et al. (2005), Information theoretical methods like Peng et al. (2005), sparse learning methods such as Li et al. (2012), and statistical based methods like Liu and Setiono (1995). Similarity-based feature selection approaches are independent of the learning algorithm, and most of them cannot handle feature redundancy or correlation between features. Likewise, statistical methods like chi-square cannot handle feature redundancy, and they need some discretization strategies. The statistical methods are also not effective in high dimensional space. Since our data is not sparse and sparse learning methods need to overcome the complexity of optimization methods, they were not a candidate for experiments. On the other hand, information retrieval methods can handle both feature relevance and redundancy. Furthermore, selected features can be generalized for learning tasks. Information gain, which is the core of Information theoretical methods, assumes that samples are independently and identically distributed. Finally, the wrapper method only sees the score of the learning algorithm and try to maximize the score of the learning algorithm. Therefore, we perform two experiments using a wrapper method and a information theoretical method.
The most common evaluation metric reported in the related works is the accuracy of the models. Therefore, we use accuracy metric to compare our work with theirs. Since the data was imbalanced, we reported the F score as well. Despite the fact that most of the related work applied the accuracy metric, it is calculated using different methods including random cross-validation, cross-validation with dividing users, cross-validation with mix users and simple division of the training and test set without cross-validation. The latter is a weak method that is used only in Zhu et al. (2018). The random cross-validation or the conventional cross-validation was applied in Xiao (2017), Liu and Lee (2017) , and Dabiri and Heaslip (2018). Zheng et al. (2010) mixed the training and test set according to users so that 70% of trajectories of a user goes to the training set and the rest goes to test set. Only Endo et al. (2016) performed the cross-validation by dividing users between the training and test set. Because trajectory data is a kind of data with spatiotemporal dimensions and the possibility of having users in the same semantic hierarchical structure such as students, worker, visitors, and teachers, the conventional cross-validation method could provide optimistic results as studied in Roberts et al. (2017). Similar to previous studies, we choose the Geolife dataset and transportation modes detection task. However, we investigate the effects of different cross-validation techniques.
3.1 Notations and definitions
A trajectory point, , so that , where is longitude varies from 0 to , is latitude varies from 0 to , and () is the capturing time of the moving object and is the set of all trajectory points. A trajectory point can be assigned by some features that describe different attributes of the moving object with a specific time-stamp and location. The time-stamp and location are two dimensions that make trajectory point spatio-temporal data with two important properties: (i) auto-correlation and (ii) heterogeneity Atluri et al. (2017). These features makes the conventional cross validation invalid Roberts et al. (2017).
A raw trajectory, or simply a trajectory, is a sequence of trajectory points captured through time. . A sub-trajectory is one of the consecutive sub-sequences of a raw trajectory generated by splitting the raw trajectory into two or more sub-trajectories. For example, if we have one split point, , and is a raw trajectory then and are two sub trajectories generated by . The process of generating sub trajectories from a raw trajectory is called segmentation. We used a daily segmentation of raw trajectories and then segmented the data utilizing the transportation modes annotations to partition the data. This approach is also used in Dabiri and Heaslip (2018) and Endo et al. (2016). The assumption that the transportation modes are available for test set segmentation is invalid since we are going to predict them by our model; However, we need to prepare a controlled environment similar to Dabiri and Heaslip (2018) and Endo et al. (2016) to study the feature selection.
A point feature is a measured value , assigned to each trajectory points of a sub trajectory . shows the feature for sub trajectory . For example, speed can be a point feature since we can calculate the speed of a moving object for each trajectory point. Since we need two trajectory points to calculate speed, we assume the speed of the first trajectory point is equal to the speed of the second trajectory point.
A trajectory feature is a measured value , assigned to a sub trajectory, . shows the feature for sub trajectory . For example, the speed mean can be a trajectory feature since we can calculate the speed mean of a moving object for a sub trajectory.
The is the notation for all trajectory features that generated using point feature . For example, represents all the trajectory features derived from point feature. Moreover, denotes the mean of the trajectory features derived from the point feature.
3.2 The framework
In this section, the sequence of steps of the framework with eight steps are explained (Figure 1).
The first step groups the trajectory points by user id, day and transportation modes to create sub trajectories (segmentation). Sub trajectories with less than ten trajectory points were discarded to avoid generating low-quality trajectories.
Point features including speed, acceleration, bearing, jerk, bearing rate, and the rate of the bearing rate were generated in step two. The features speed, acceleration, and bearing were first introduced in Zheng et al. (2008), and jerk was proposed in Dabiri and Heaslip (2018). The very first point feature that we generated is duration. This is the time difference between two trajectory points. This feature gives us essential information including some of the segmentation position points, loss signal points, and is useful in calculating point features such as speed, and acceleration. The distance was calculated using the haversine formula. Having duration and distance as two point features, we calculate speed, acceleration and jerk using Equation , , and respectively. A function to calculate the bearing () between two consecutive points was also implemented. Two new features were introduced in Etemad et al. (2018), named bearing rate, and the rate of the bearing rate. Applying , we computed the bearing rate. and are the bearing point feature values in points and . is the time difference.The rate of the bearing rate point feature is computed using . Since extensive calculations are done with trajectory points, it was necessary an efficient way to calculate all these equations for each trajectory. Therefore, the code was written in a vectorized manner in Python programming language which is faster than other online available versions.
After calculating the point features for each trajectory, the trajectory features were extracted in step three. Trajectory features are divided into two different types including global trajectory features and local trajectory features. Global features, like the Minimum, Maximum, Mean, Median, and Standard Deviation, summarize information about the whole trajectory and local trajectory features, like percentiles (e.g., 10, 25, 50, 75, and 90), describe a behavior related to part of a trajectory. The local trajectory features extracted in this work were the percentiles of every point feature. Five different global trajectory features were used in the models tested in this work. In summary, we compute 70 trajectory features (i.e., 10 statistical measures including five global and five local features calculated for 7 point features) for each transportation mode sample. In Step 4, two feature selection approaches were performed, wrapper search and information retrieval feature importance. According to the best accuracy results for cross-validation, a subset of top 20 features was selected in step 5. The code implementation of all these steps is available at https://github.com/metemaad/TrajLib.
In step 6, the framework deals with noise in the data optionally. This means that we ran the experiments with and without this step. Finally, we normalized the features (step 7) using the Min-Max normalization method, since this method preserves the relationship between the values to transform features to the same range and improves the quality of the classification process (Han et al., 2011).
In this section, we detail the four experiments performed in this work to investigate the different aspects of our framework. In this work, we used the GeoLife dataset (Zheng et al., 2008). This dataset has 5,504,363 GPS records collected by 69 users, and is labeled with eleven transportation modes: taxi (4.41%); car (9.40%); train (10.19%); subway (5.68%); walk (29.35%); airplane (0.16%); boat (0.06%); bike (17.34%); run (0.03%); motorcycle (0.006%); and bus (23.33%). Two primary sources of uncertainty of the Geolife dataset are device and human error. This inaccuracy can be categorized in two major groups, systematic errors and random errors (Jun et al., 2006). The systematic error occurs when the recording device cannot find enough satellites to provide precise data. The random error can happen because of atmospheric and ionospheric effects. Furthermore, the data annotation process has been done after each tracking as Zheng et al. (2008) explained in the Geolife dataset documentation. As humans, we are all subject to fail in providing precise information; it is possible that some users forget to annotate the trajectory when they switch from one transportation mode to another. For example, the changes in the speed pattern (changes in the size of marker) might be a representation of human error.
We assume the bayes error is the minimum possible error and human error is near to the bayes error Ng (2016). Avoidable bias is defined as the difference between the training error and the human error. Achieving the performance near to the human performance in each task is the primary objective of the research. The recent advancements in deep learning lead to achieving some performance level even more than the performance of doing the task by human because of using large samples and scrutinizing the data to fine clean it. However, “we cannot do better than bayes error unless we are overfitting”. Ng (2016). Having noise in GPS data and human error suggest the idea that the avoidable bias is not equal to zero. This ground truth was our base to include research results in our related work or exclude it.
The user-oriented cross-validation and the random forest classifier were used for evaluation of transportation modes used in Endo et al. (2016). The wrapper method implemented to search the best subset of our 70 features. The information theoretical feature importance methods were used to select the best subset of our 70 features for the transportation modes prediction task. The third experiment is a comparison between Endo et al. (2016) and our implementation. The user-oriented cross-validation, the top 20 best features, and random forest were applied to compare our work with Endo et al. (2016). The random cross-validation on the top 20 features was applied to classify transportation modes used in Dabiri and Heaslip (2018) using a random forest classifier.
4.1 Classifier selection
In this experiment, we investigated among six classifiers, which classifier is the best. The experiment settings use to conventional cross-validation and to perform the transportation mode prediction task showed on Dabiri and Heaslip (2018). XGBoost, SVM, decision tree, random forest, neural network, and adaboost are six classifiers that have been applied in the reviewed literature (Zheng et al., 2010; Xiao, 2017; Zhu et al., 2018; Etemad et al., 2018). The dataset is filtered based on labels that have been applied in Dabiri and Heaslip (2018) (e.g., walking, train, bus, bike, driving) and no noise removal method was applied. The classifiers mentioned above were trained, and the accuracy metric was calculated using random cross-validation similar to Liu and Lee (2017), Xiao (2017), and Dabiri and Heaslip (2018). The results of cross validation, presented in Figure 2, show that the random forest performs better than other models (). The second best model was XGBoost (). A Wilcoxon Signed-Ranks Test indicated that the random forest classifier results were not statistically significantly higher than the XGBoost classifier results. Wilcoxon Signed-Ranks Tests indicated that the random forest classifier results were statistically significantly higher than the SVM, Neural Network, and Adaboost classifiers results. Moreover, a Wilcoxon Signed-Ranks Test indicated that the random forest classifier results were not statistically significantly higher than the Decision Tree classifier results.
4.2 Feature selection using wrapper and information theoretical methods
The second experiment aims to select the best features for transportation modes prediction task.We selected the wrapper feature selection method because it can be used with any classifier. Using this approach, we first defined an empty set for selected features. Then, we searched all the trajectory features one by one to find the best feature to append to the selected feature set. The maximum accuracy score was the metric for selecting the best feature to append to selected features. After, we removed the selected feature from the set of features and repeated the search for union of selected features and next candidate feature in the feature set. We selected the labels applied in Endo et al. (2016) and the same cross-validation technique. The results are shown in Figure 3 (a). The results of this method suggest that the top 20 features get the highest accuracy. Therefore, we selected this subset as the best subset for classification purposes using the Random Forest algorithm.
Information theoretical feature selection is one of the methods widely used to select essential features. Random Forest is a classifier that has embedded feature selection using information theoretical metrics. We calculated the feature importance using Random Forest. Then, each feature is appended to the selected feature set and calculating the accuracy score for random forest classifier. The user-oriented cross-validation was used here, and the target labels are similar to Endo et al. (2016). Figure 3 shows the results of cross-validation for appending features with respect to the importance rank suggested by the Random Forest.
In this third experiment, we filtered transportation modes which have been used by Endo et al. (2016) for evaluation. We divided the training and test dataset in a way that each user can appear only either in the training or test set. The top 20 features were selected to be used in this experiment which is the best features subset mentioned in section 4.2. Therefore, we approximately divided 80% of the data as training and 20% of the data as the test set. Thus, we compare our accuracy per segment results against Endo et al. (2016) mean accuracy, 67.9%. A one-sample Wilcoxon Signed-ranks test indicated that our accuracy results (69.50%) are higher than Endo et al. (2016)’s results (67.9%), p=0.0431.
The label set for Dabiri and Heaslip (2018)’s research is walking, train, bus, bike, taxi, subway, and car so that the taxi and car are merged and called driving. Moreover, subway and train merged and called the train class. We filtered the Geolife data to get the same subsets as Dabiri and Heaslip (2018) reported based on that. Then, we randomly selected 80% of the data as the training and the rest as test set- we applied five-fold cross-validation. The best subset of features was applied the same as the previous experiment. Running the random forest classifier with 50 estimators, using SKlearn implementation Pedregosa et al. (2011), gives a mean accuracy of 88.5% for the five-fold cross-validation. A one-sample Wilcoxon Signed-ranks test indicated that our accuracy results (88.50%) are higher than Dabiri and Heaslip (2018)’s results (84.8%), p=0.0796.
We avoided using the noise removal method in the above experiment because we believe we do not have access to labels of the test dataset and using this method only increases our accuracy unrealistically.
4.4 Effects of types of cross-validation
To visualize the effect of type of cross-validation on transportation modes prediction task, we set up a controlled experiment. We use the same classifiers and same features to calculate the cross-validation accuracy. Only the type of cross-validation is different in this experiment, one is random, and another is user-oriented cross-validation. Figure 4 shows that there is a considerable difference between the cross-validation results of user-oriented cross-validation and random cross-validation. The result indicates that random cross-validation provides optimistic accuracy and f-score results. Since the correlation between user-oriented cross-validation results is less than random cross-validation, proposing a specific cross-validation method for evaluating the transportation mode prediction is a topic that needs attention.
In this work, we reviewed some recent transportation modes prediction methods and feature selection methods. The framework proposed in Etemad et al. (2018) for transportation modes prediction was extended, and five experiments were conducted to cover different aspects of transportation modes prediction.
First, the performance of six recently used classifiers for the transportation modes prediction was evaluated. The results show that the random forest classifier performs the best among all the evaluated classifiers. The SVM was the worst classifier, and the accuracy result of XGBoost was competitive with the random forest classifier. In the second experiment, the effect of features using two different approaches, the wrapper method and information theoretical method were evaluated. The wrapper method shows that we can achieve the highest accuracy using the top 20 features. Both approaches suggest that the (the percentile 90 of the speed as defined in section 3) is the most essential feature among all 70 introduced features. This feature is robust to noise since the outlier values do not contribute to the calculation of percentile 90. In the third experiment, the best model was compared with the results showed in Endo et al. (2016) and Dabiri and Heaslip (2018). The results show that our suggested model achieved a higher accuracy. Our applied features are readable and interpretable in comparison to Endo et al. (2016) and our model has less computational cost. Finally, we investigate the effects of user-oriented cross-validation and random cross-validation in the fourth experiments. The results showed that random cross-validation provides optimistic results in terms of the analyzed performance measures.
We intend to extend this work in many directions. The spatiotemporal characteristic of trajectory data is not taken into account in most of the works from literature. We intend to deeply investigate the effects of cross-validation and other strategies like holdout in trajectory data. Finally, space and time dependencies can also be explored to tailor features for transportation means prediction.
- Atluri et al. (2017) Atluri, G., Karpatne, A., Kumar, V., 2017. Spatio-temporal data mining: A survey of problems and methods. arXiv arXiv:1711.04710 .
- Dabiri and Heaslip (2018) Dabiri, S., Heaslip, K., 2018. Inferring transportation modes from gps trajectories using a convolutional neural network. Transportation Research Part C: Emerging Technologies 86, 360–371.
- Endo et al. (2016) Endo, Y., Toda, H., Nishida, K., Kawanobe, A., 2016. Deep feature extraction from trajectories for transportation mode estimation, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer. pp. 54–66.
- Etemad et al. (2018) Etemad, M., Soares Júnior, A., Matwin, S., 2018. Predicting transportation modes of gps trajectories using feature engineering and noise removal, in: Advances in AI: 31st Canadian Conf. on AI, Canadian AI 2018, Toronto, ON, CA, Proc. 31, Springer. pp. 259–264.
- Feng et al. (2017) Feng, S., Cong, G., An, B., Chee, Y.M., 2017. Poi2vec: Geographical latent representation for predicting future visitors., in: AAAI.
- Fossette et al. (2010) Fossette, S., Hobson, V.J., Girard, C., Calmettes, B., Gaspar, P., Georges, J.Y., Hays, G.C., 2010. Spatio-temporal foraging patterns of a giant zooplanktivore, the leatherback turtle. Journal of Marine systems 81, 225–234.
- Gu and Han (2011) Gu, Q., Han, J., 2011. Towards feature selection in network, in: Proceedings of the 20th ACM ICIKM, ACM. pp. 1175–1184.
- Guyon and Elisseeff (2003) Guyon, I., Elisseeff, A., 2003. An introduction to variable and feature selection. Journal of ML research 3, 1157–1182.
- Han et al. (2011) Han, J., Pei, J., Kamber, M., 2011. Data mining: concepts and techniques. Elsevier.
- He et al. (2005) He, X., Cai, D., Niyogi, P., 2005. Laplacian score for feature selection, advances in nerual information processing systems.
- Hwang et al. (2018) Hwang, S., VanDeMark, C., Dhatt, N., Yalla, S.V., Crews, R.T., 2018. Segmenting human trajectory data by movement states while addressing signal loss and signal noise. International Journal of Geographical Information Science , 1–22.
- Jun et al. (2006) Jun, J., Guensler, R., Ogle, J., 2006. Smoothing methods to minimize impact of global positioning system random error on travel distance, speed, and acceleration profile estimates. Transportation Research Record: Journal of the TRB 1, 141–150.
- Kang et al. (2009) Kang, H.Y., Kim, J.S., Li, K.J., 2009. Similarity measures for trajectory of moving objects in cellular space, in: SIGAPP09, pp. 1325–1330.
- Li et al. (2017) Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H., 2017. Feature selection: A data perspective. CSUR 50, 94.
- Li et al. (2012) Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H., et al., 2012. Unsupervised feature selection using nonnegative spectral analysis., in: AAAI.
- Liu and Lee (2017) Liu, H., Lee, I., 2017. End-to-end trajectory transportation mode classification using bi-lstm recurrent neural network, in: Intelligent Systems and Knowledge Engineering (ISKE), 2017 12th International Conference on, IEEE. pp. 1–5.
- Liu and Setiono (1995) Liu, H., Setiono, R., 1995. Chi2: Feature selection and discretization of numeric attributes, in: Tools with artificial intelligence, 1995. proceedings., seventh international conference on, IEEE. pp. 388–391.
- Ng (2016) Ng, A., 2016. Nuts and bolts of building ai applications using deep learning, NIPS.
- Parent et al. (2013) Parent, C., Spaccapietra, S., Renso, C., Andrienko, G., Andrienko, N., Bogorny, V., Damiani, M.L., Gkoulalas-Divanis, A., Macedo, J., Pelekis, N., Theodoridis, Y., Yan, Z., 2013. Semantic trajectories modeling and analysis. ACM Comput. Surv. 45, 42:1–42:32.
- Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: Machine learning in Python. MLR .
- Peng et al. (2005) Peng, H., Long, F., Ding, C., 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence 27, 1226–1238.
- Roberts et al. (2017) Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J.J., Schröder, B., Thuiller, W., et al., 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40, 913–929.
- Soares Júnior et al. (2015) Soares Júnior, A., Moreno, B.N., Times, V.C., Matwin, S., Cabral, L.d.A.F., 2015. Grasp-uts: an algorithm for unsupervised trajectory segmentation. International Journal of Geographical Information Science 29, 46–68.
- Soares Júnior et al. (2017) Soares Júnior, A., Renso, C., Matwin, S., 2017. Analytic: An active learning system for trajectory classification. IEEE Computer Graphics and Applications 37, 28–39. doi:10.1109/MCG.2017.3621221.
- Soares Júnior et al. (2018) Soares Júnior, A., Times, V.C., Renso, C., Matwin, S., Cabral, L.A.F., 2018. A semi-supervised approach for the semantic segmentation of trajectories, in: 2018 19th IEEE International Conference on Mobile Data Management (MDM), pp. 145–154.
- de Souza et al. (2016) de Souza, E.N., Boerder, K., Matwin, S., Worm, B., 2016. Improving fishing pattern detection from satellite ais using data mining and machine learning. PloS one 11, e0158248.
- Xiao (2017) Xiao, 2017. Identifying different transportation modes from trajectory data using tree-based ensemble classifiers. ISPRS 6, 57.
- Zhao and Liu (2007) Zhao, Z., Liu, H., 2007. Spectral feature selection for supervised and unsupervised learning, in: Proceedings of the 24th international conference on Machine learning, ACM. pp. 1151–1157.
- Zheng et al. (2010) Zheng, Y., Chen, Y., Li, Q., Xie, X., Ma, W.Y., 2010. Understanding transportation modes based on gps data for web applications. TWEB 4, 1.
- Zheng et al. (2008) Zheng, Y., Li, Q., Chen, Y., Xie, X., Ma, W.Y., 2008. Understanding mobility based on gps data, in: UbiComp 10th, ACM. pp. 312–321.
- Zhu et al. (2018) Zhu, Q., Zhu, M., Li, M., Fu, M., Huang, Z., Gan, Q., Zhou, Z., 2018. Transportation modes behaviour analysis based on raw gps dataset. International Journal of Embedded Systems 10, 126–136.