Discovery of Driving Patterns by Trajectory Segmentation
Abstract
Telematics data is becoming increasingly available due to the ubiquity of devices that collect data during drives, for different purposes, such as usage based insurance (UBI), fleet management, navigation of connected vehicles, etc. Consequently, a variety of dataanalytic applications have become feasible that extract valuable insights from the data. In this paper, we address the especially challenging problem of discovering behaviorbased driving patterns from only externally observable phenomena (e.g. vehicle’s speed). We present a trajectory segmentation approach capable of discovering driving patterns as separate segments, based on the behavior of drivers. This segmentation approach includes a novel transformation of trajectories along with a dynamic programming approach for segmentation. We apply the segmentation approach on a realword, rich dataset of personal car trajectories provided by a major insurance company based in Columbus, Ohio. Analysis and preliminary results show the applicability of approach for finding significant driving patterns.
1.2
2016 \conferenceinfo3rd ACM SIGSPATIAL PhD Workshop’16,October 31November 03 2016, Burlingame, CA, USA
3
F.2.2Theory of computationMathematical optimization \categoryH.2.8Information systemsSpatialtemporal systems
1 Introduction
The amount of telematics data has drastically increased thanks to the ubiquity of various types of devices and mobile apps to collect data during drive. Some instances of such transportation data are the New York taxi cab
Example 1
Consider the trajectory in Figure 1. Red dots show the location of the car for every second of the trip. The trajectory begins at the bottom center and continues to the left after a clockwise turn. Different parts of the trip exhibit different driving behaviorbased patterns marked out by ovals. For instance, the green oval shows slow movement, where the captured locations are close to each other. Another pattern occurs when the car enters the ramp and merges into a highway (blue oval).
Example 1 is intended to illustrate that driving patterns are portions of a trajectory where there is homogeneity of driving behavior. The problem of finding significant driving patterns, as described by Example 1, is a challenging one for following reasons. First, unlike studies such as [7, 11] which collected data using a fully monitored environment (for example, with cameras placed inside the car monitoring the driver’s every move and expression), and with a small set of drivers and routes, our dataset is the result of collecting data by observing only externally visible phenomena (e.g. vehicle’s speed) with no additional intrusive monitoring. In addition, because of the size of the dataset of trajectories, and the potentially wide range of identifiable and useful driving patterns, a supervised approach is not viable. Thus, finding significant set of driving patterns is a challenging problem, worthy of our study.
Discovery of behaviorbased driving patterns is a part of a more generic framework for analysis of behavior of drivers to reveal how risky or safe are their driving habits. The result of such studies can be used for usage based insurance, driver coaching, risk management, and other related purposes. The main contribution of this paper is a novel trajectory segmentation approach to find driving patterns, based on the behavior of drivers. The rest of this paper is structured as follows: Section 2 provides the formal problem statement and required definitions. Detail of trajectory segmentation approach is addressed in Section 3. Next, the evaluation protocol and preliminary results are presented in Section 4. We provide a summary of related work in Section 5. Section 6 concludes our study and describes potential future work.
2 Problem Statement
Assume we are given a transportation database of the form where and are the set of vehicles and trajectories, respectively. Each trajectory is sequence of data points . Each data point is a tuple of the form which captures a vehicle’s status at time as its latitude and longitude are , with speed (km/h), acceleration (), and heading (degrees). All time is assumed to be in seconds. Also, the heading is the direction of the moving vehicle, described by a degreevalue between 0 and 359, where 0 means the north.
A segmentation for a trajectory into segments, denoted as , is a set of cutting indexes that mark the beginning points of the segments within a trajectory. Thus, we can define a set of cutting data points for the segmented trajectory as . Note that . All data points between indexes and , including point and excluding point , belong to the segment. We denote the segment of as and its size as . Note that segments are nonoverlapping. Each segment represents a driving pattern and each cutting point , represents a transition between patterns. Figure 1 demonstrates segments (by ovals) and cutting points (by arrows) for a given trajectory. We define the optimization objectives for segmentation task as i) maximizing homogeneity within segments, ii) minimizing homogeneity between neighboring segments, and iii) minimizing the number of created segments.
3 Segmentation approach
We propose a novel approach to intelligently partition a trajectory, such that each resulting homogeneous segment corresponds to a specific driving pattern. Our trajectory segmentation approach includes following steps:

Preprocessing of the trajectory dataset.

Creating a memoryless Markov Model based on behavior of population of drivers in trajectory dataset.

Using the Markov Model to transform a trajectory to a in Probabilistic Movement Dissimilarity (PMD) space.

Segmenting a signal by using a Dynamic Programming Segmentation approach and finding the best number of segments by Minimum Descriptor Length (MDL).
We next describe each step in more detail.
3.1 Preprocessing the Dataset
Regarding the description of the data model in section 2, the data set is a collection of trajectories, where each trajectory has a sequence of data points. The main steps for preprocessing the dataset are as follows:

Remove data points with missing or noisy (out of range) GPS records.

Normalize the values of and to be divisible by 0.25 and 5 respectively. This step helps to simplify the Markov Model, by reducing the number of possible states.

Create training and test sets: We use the training set for creating the Markov Model and the test set for experiments.
3.2 Creating the Markov Model
We create a memoryless Markov model , where is the set of states, is the set of transition between states (along with the frequency of each transition), and is the set of probabilities of transition between the states. We use the following guidelines to create the :

State: We define a state as .

Transition: Given a trajectory , for each pair of consecutive data points and of , where , we create two states and for and respectively. We denote a transition from state to as . If doesn’t contain transition , then we insert into . Otherwise, we increase the frequency of transition by 1.

Probability of Transition: For a specific state , let us assume there is a where , and where is the number of observed transitions from to in the dataset, we update by inserting the probability of each transition , , using Equation 1:
(1)
3.3 Transforming Trajectories
The aim of our segmentation approach is to provide a segmentation of trajectories based on behavior of drivers. Hence, an important step is to transform an input trajectory to a signal in Probabilistic Movement Dissimilarity (PMD) space. Suppose we have a trajectory and a Markov Model , we propose Algorithm 1 to map to a signal in PMD space. Given consecutive data points , Algorithm 1 first maps them to states and respectively. Then, it calculates how is the transition , based on .
In Algorithm 1, returns a state corresponding to input data point , and returns transition probability from to . returns a list of all states given an input state , such that transition . Also, note that if and represent the same state, then the transition is quite likely. Based on this algorithm, we map a test trajectory to a signal in PMD space. The signal of a trajectory demonstrates the unlikelihood of behavior of driver during the trip. An unlikelihood score is calculated based on the transition probabilities in the Markov Model . Lines 7 to 14 in Algorithm 1 measure how far the observed transition is from our expectation regarding the Markov Model .
Figure 2 depicts a part of a sample trajectory and it’s corresponding signal in PMD space. The numbers in rectangular callouts in Figure 2.A show time stamps which can be matched with Time axis in Figure 2.B. The more unlikely the behavior of driver be, the larger the value of PMD is. For instance, a large PMD value is observable for time stamp 991 in Figure 2.B, where the actual trip in Figure 2.A shows an unexpected reduction in speed and also a lane change.
The main takeaway from this step is that we use a signal in PMD space as a representation of the behavior of a driver for a given trip, in comparison with the rest of the population of drivers and trajectories.
3.4 Dynamic Programming Trajectory Segmentation
Once the signal for a trajectory has been created, the trajectory segmentation problem reduces to a Signal Segmentation problem. For segmenting a signal, we use an existing approach which has been successfully applied for segmenting electrical signals [6]. This approach is a dynamic programming algorithm that uses the Maximum Likelihood principle for segmenting one dimensional signals. Given an input signal , the Maximum Likelihood for can be defined by Equation 2.
(2) 
In this formula, is the set of parameters for a probability density function (PDF) , which can be estimated based on data points of signal . As in [6], we leverage the Gaussian distribution to find the parameters of the PDF f, thus, , where and are the sample mean and standard deviation respectively.
Note that the goal of segmenting a trajectory and it’s signal (see section 2), is to find a set of cutting indexes , where is the best number of existing segments (i.e. with the greatest maximum likelihood). The recurrence relation for segmenting the signal is defined below:
(3) 
In Equation 3, gives the best Segmentation Score for a subsequence of signal which starts at index , with the goal being to find segments. Also, gives the maximum likelihood score for subsequence of . Note that we assume the minimum length of a segment to be 2. More details of this algorithm may be found in [6].
The last question in this subsection is: how to find the best number of existing segments within a signal? We use the Minimum Descriptor Length (MDL) [10] for this purpose, which has been applied in [6] as well. MDL tries to minimize the Equation 4 for , where n is the number of segments and is the maximum possible number of segments (chosen by the user):
(4) 
In Equation 4, is the parameter set of the corresponding PDF, is the number of estimated parameters (where is the number of segments), and is the length of the signal. Figure 3 shows a part of a segmented signal which is related to the sample trajectory in figure 2.A. The blue lines in figure 3 show the starting points of segments (i.e. the cutting points). The best number of segments which has been found by our MDL algorithm is 5. Note that we can observe the homogeneity of driving behavior patterns within segments and the heterogeneity of the driving patterns between segments.
As an example of driving behavior pattern which is captured by our trajectory segmentation approach, we point to the segment which starts at time stamp 986 in Figure 3. Regarding the actual trip in 2.A, we see this segment is related to a part of driving behavior where driver reduces speed and changes the lanes.
4 evaluation
We first describe the dataset which is used in this study. Then, we provide experimental settings and some statistics as earlier results of trajectory segmentation approach which is applied on our realworld dataset
4.1 Trajectory Dataset
We used a realworld dataset of 100,000 personal car trajectories provided by a major insurance company based in Columbus, Ohio. These trajectories were collected during 2011 to 2015. We used approximately 95% of trajectories for training (i.e. creating the Markov model) and 5% as the test set (for evaluation). The test dataset contains about 4,500 trajectories of 92 drivers for 5 different, popular routes in the city. Routes and number of trajectories for each is summarized in Table 1.
Route  #Trajectories  Avg. Length  Avg. #Segment  Std. #Segment 

315 Fwy  426  705  8  7 
I270  701  389  4.9  3.8 
I670  443  392  7.4  6.4 
I70  1,572  324  5.4  4.9 
I71  1,320  549  7.5  6.8 
4.2 Segmentation results
We used the process which is described in Section 3 to segment trajectories in the test set. To find the the upper bound on the number of existing segments (Section 3), we used a heuristic as follows: for a given trajectory of length , we set . Based on the segmentation result which is illustrated in Table 1, this is a reasonable upper bound. Note that the best number of segments is likely a result of the length of the trips in test set. Table 1 summarizes the segmentation results by providing the average and standard deviation for the number of segment for trajectories in different routes of the test set.
5 Related Work
Trajectory Segmentation, as described in Section 2, has been addressed in the literature in several studies like [4, 1, 5, 3]. In [4], a greedy segmentation algorithm exploits a set of monotonic spatiotemporal criteria (e.g., defining relative thresholds for some feature values) on features like speed, heading, etc. Alewijnse et al. extended the previous work to both monotonic and nonmonotonic criteria [1]. However, criteriabased methods need human input for tuning parameters. Moreover, they are contextagnostic in that they only consider the input trajectory and not the whole dataset. Therefore, the optimization process is a local one, where we propose a global optimization for segmentation.
Our segmentation approach is a contextaware one by building a Markov Model for the whole dataset prior to segmentation. Similarly, some contextaware approaches are proposed in the literature including [8, 2]. Alewijnse et al. [2] present a contextaware approach which builds a Brownian Bridge model and uses a dynamic programming algorithm to capture the best set of segments of animal movements. While our solution bears some similarities with [2], it exploits a normal distribution model instead, which we find it more suitable for car transportation data.
In [9], a trajectorytosignal transformation is performed prior to segmentation using similarity values between each line segment of input trajectory and the rest of the line segments in the dataset, using global voting. Then, segmentation discovery is done using a slidingwindow approach. Our approach, in contrast, performs a behavior likelihoodbased transformation to provide a behavior based segmentation and to find the segments which are representatives for driving behavior patterns. Essentially, our solution is a global optimizationbased segmentation approach that builds up a model on the entire dataset. Note also that here is no need for human intervention in our solution as in [4, 1].
6 Conclusion and future work
In this paper, we proposed a Trajectory Segmentation approach to detect behavior based driving patterns for a given trajectory, based on externally observable phenomena. Our approach is a context aware solution which considers the behavior of the entire population of drivers to detect driving patterns. Our preliminary analysis based on existing use cases demonstrate the interpretability of segmentation results, as one of them described in Section 3 for instance (Figures 2 and 3).
We use the current study as a part of a more generic framework for analyzing the behavior of drivers to reveal how risky or safe their driving habits are. Other parts of this framework can be outlined as follows and they also will be considered as extensions of current study. In order to get more insight about extracted patterns by segmentation approach, we will design a supervised learning approach to learn and then predict true labels for patterns. Potential labels may be making a turn, changing the lane, merging to a highway, etc. Moreover, by having true labels for extracted patterns, we will apply sequential pattern mining techniques to extract significant sequences of driving patterns for a single driver or a population of drivers. Finally, by having human experts in the loop, we will identify the safe or risky sequences of driving patterns. In this way, we can formulate the problem of finding safe or risky drivers, based on their driving habits, as an endtoend solution.
Footnotes
 http://toddwschneider.com/posts/analyzing11billionnyctaxiandubertripswithavengeance/
 Code and sample data can be find in this GitHub repository: https://github.com/sobhanmoosavi/Trajectory_Segmentation
References
 (2014) A framework for trajectory segmentation by stable criteria. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 351–360. Cited by: §5, §5.
 (2014) Modelbased segmentation and classification of trajectories. In Dead Sea, Israel: Proceedings of the 30th European Workshop on Computational Geometry March, pp. 3–5. Cited by: §5.
 (2006) Global distancebased segmentation of trajectories. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 34–43. Cited by: §5.
 (2010) An algorithmic framework for segmenting trajectories based on spatiotemporal criteria. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 202–211. Cited by: §5, §5.
 (2013) Pathlet learning for compressing and planning trajectories. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 392–395. Cited by: §5.
 (2004) Optimal segmentation of signals and its application to image denoising and boundary feature extraction. In Image Processing, 2004. ICIP’04. 2004 International Conference on, Vol. 4, pp. 2693–2696. Cited by: §3.4, §3.4, §3.4.
 (2001) Modeling and prediction of human driver behavior. In Intl. Conference on HCI, Cited by: §1.
 (2002) Trajectory segmentation using dynamic programming. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, Vol. 1, pp. 331–334. Cited by: §5.
 (2012) Segmentation and sampling of moving object trajectories based on representativeness. IEEE Transactions on Knowledge and Data Engineering 24 (7), pp. 1328–1343. Cited by: §5.
 (1978) Modeling by shortest data description. Automatica 14 (5), pp. 465–471. Cited by: §3.4.
 (2008) Driver behavior analysis and route recognition by hidden markov models. In Vehicular Electronics and Safety, 2008. ICVES 2008. IEEE International Conference on, pp. 276–281. Cited by: §1.
 (2010) Tdrive: driving directions based on taxi trajectories. In Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems, pp. 99–108. Cited by: §1.