Characterizing Driving Styles with Deep Learning
Abstract
Characterizing driving styles of human drivers using vehicle sensor data, e.g., GPS, is an interesting research problem and an important realworld requirement from automotive industries. A good representation of driving features can be highly valuable for autonomous driving, auto insurance, and many other application scenarios. However, traditional methods mainly rely on handcrafted features, which limit machine learning algorithms to achieve a better performance. In this paper, we propose a novel deep learning solution to this problem, which could be the first attempt of extending deep learning to driving behavior analysis based on GPS data. The proposed approach can effectively extract high level and interpretable features describing complex driving patterns. It also requires significantly less human experience and work. The power of the learned driving style representations are validated through the driver identification problem using a large real dataset.
Characterizing Driving Styles with Deep Learning
Weishan Dong^{†}^{†}thanks: IBM Research – China, Beijing, China. {dongweis, rjyaobj, lcsheng, ytyuanyt}@cn.ibm.com and Jian Li^{†}^{†}thanks: Nanjing University, China. lionellijian@hotmail.com and Renjie Yao and Changsheng Li and Ting Yuan and Lanjun Wang^{†}^{†}thanks: University of Waterloo, Canada. lanjun.wang@uwaterloo.ca 
1 Introduction
Deep neural networks have been intensively studied in recent years, and many recordbreaking progresses have been made on solving computer vision, speech recognition, and natural language processing problems [13, 1, 4]. However, so far few attempts have been made on applying deep learning to trajectory data analysis, which is a key research topic in spatiotemporal data analytics, urban computing, intelligent transportation, and InternetofThings (IoT) areas. In this work, we study an important realworld problem of trajectory analysis and propose a deep learning based solution. The problem comes from the automotive industry, especially the auto insurance and the telematics domain, that is, to characterize the driving styles of car drivers from vehicle sensor data, e.g., GPS (Global Positioning System). Because of individual differences, each driver has a signature driving style which is a complex combination of finegrained driving behaviors and habits. Ideally, it should cover the way of accelerating, braking, turning, etc., and their (temporal) combinations given specific driving contexts such as road levels, road shapes, traffic conditions, and even weather.
A good driving style representation is useful in many ways. For instance, it can be particularly useful in autonomous driving that has become a hot topic in both industries and academia. A better understanding of how human drive a car is helpful to teach machines drive like a human [14]. Other examples include to assess drivers’ driving risks when correlated with external labels such as claims, accidents, and traffic violations [15]. In auto insurance businesses (e.g., payasyoudrive and payhowyoudrive), this is a key pricing reference. Another common and interesting application is driver identification, i.e., to identify the true driver of anonymized trips [12], which is useful in scenarios including claim fraud detection, estimating how many drivers share a car for insurance pricing, and the design of intelligent driver assistance systems [20]. When the number of candidate drivers is large (e.g., 1000), it becomes a much harder classification problem than just differentiate safe and unsafe driving behaviors. If a good driving style representation can solve the driver identification problem, we have reasons to believe that other problems based on driving behavior characterizations can also be solved well. Therefore, in this paper we take the driver identification problem as a sample task to evaluate the effectiveness of driving style characterization.
The stateoftheart methods of modeling driving styles are mainly based on handcrafted features [15, 20]. However, manually defining the driving style by traditional feature engineering is challenging: (1) It heavily relies on domain knowledge and human experience. (2) The discriminative power of the features is often unknown before feeding into machine learning algorithms; so a common practice is to enumerate as many features as possible and then apply feature selection, which requires considerable efforts. (3) The best descriptors of driving patterns may change given different data and contexts, e.g., drivers in China may have different driving patterns from those in US, thus a generic model is often hard to obtain. (4) The feature designs are usually separated from the learning algorithms, which cannot guarantee a best synergia between features and algorithms. (5) Driving behaviors are typically a sequence of operations, therefore possible combinations of features to define such sequences can be huge. It is hardly to find an optimal driving style representations just by enumerations.
On the other hand, recent advances of deep learning reveal that deep neural network is a promising method of extracting features directly from sensor signal data (e.g., speech). Inspired by this, in this paper, we propose a novel deep learning approach for characterizing driving styles from automotive sensor data, which consists of: (1) a special design of transforming raw sensor data (typically, GPS) into a form of lowlevel feature matrices, and (2) deep neural network architectures based on convolutional neural network (CNN) and recurrent neural network (RNN) to learn highlevel driving style features from the lowlevel feature matrices. In such a way, complex and higher level discriminative features characterizing how a human driver drives a car can be effectively obtained.
Compared with existing methods, the proposed deep learning approach results in significantly less human work and better accuracy. To the best of our knowledge, this is the first work of extending deep learning to directly learning driving style representation from automotive sensor data such as GPS. Experiments on a large real dataset will show that, in terms of identifying the true driver of a trip, the proposed deep learning approach dramatically outperforms the stateoftheart methods by a large margin. If considering additional trip level features (such as trip length, trip shape, etc.) other than just the driving behavior features, performance of traditional machine learning methods can be improved, but is still worse than the proposed deep learning approach that only considers driving style features. This indicates that deep learning can be a powerful tool for characterizing driving styles.
The remainder of this paper is organized as follows. Section 2 details the proposed deep learning approach. Section 3 presents experimental studies on a large real dataset. Section 4 discusses a few application related problems. Section 5 reviews related work in literature. Finally, section 6 concludes the paper.
2 Proposed Approach
The proposed approach consists of two components: data transformation and feature learning by deep networks. For simplicity, we consider GPS data as the only input. Nonetheless, as will be discussed later in Section 4, our approach can be easily generalized to work with other types of sensor data and rich driving contexts.
2.1 Data Transformation: from Geospatial Domain to Movement Statistics Domain
Deep neural networks have been proved powerful in learning from speech data [7, 8, 1]. GPS sensor data are also kinds of time series having similar characteristics with speech signals, which is the primary motivation for us to develop deep learning methods for driving feature representation learning. However, a huge gap is that GPS data in its raw format – a sequence of point geolocations defined by 2D coordinates , each having a timestamp – encode spatiotemporal information in an implicit way. Our empirical studies showed that simply treating raw GPS data as threedimensional signal inputs for either traditional machine learning or deep learning algorithms just does not work. A practical way of transforming GPS data sequences (or say, trajectories) into an easier consumable form for deep neural networks needs to be developed.
We define the GPS trajectory as a sequence of tuples , whose length can be varying. Inspired by the idea of NGram probabilistic model proposed by Brown et al. [2], where each word depends only on the last words, we utilize the context window concept to make our vision focused on fixed length’s trajectory in periods of time. Specifically, during each window, we may discover potential patterns through observing the behaviors of the driver over different situations. For examples, some drivers may go through a sharp corner quickly while the others would like to slow down, and heavy accelerations can often happen to a group of drivers while the others may never have such aggressive driving behaviors. In fact, the behaviors are interdependent in time where the length of this period can be . We can roughly say that a current driving behavior depends on what happened in the last time points. Because of this, there can be more possibility to discover driving patterns if we focus on the trajectory from a windowed perspective. We can let machine “understand” or “define” the behaviors in the period of time. To avoid too much information loss, the original trajectory is segmented with a shift of so that there is overlap between the neighboring segments.
Comparing to the frequency feature map of speech used for deep neural networks [1, 8], the next step after we obtain segments from the raw trajectory is to generate feature maps just like the spectrogram of speech which has both axes of frequency and time. Nonetheless, unlike speech, there is no welldeveloped method to construct feature maps from trajectories. We propose to use the following five features to replace the frequency axis of speech data’s feature map: (1) speed norm, (2) difference of speed norm, (3) acceleration norm, (4) difference of acceleration norm, and (5) angular speed. We call them as basic features derived from GPS data at every time point. As a result, each segment has a basic feature matrix sized . Notably, these are all pointwise instantaneous movement features, which are easy to calculate.
To reduce the possible impacts of outliers (which may be generated by small sensor data errors) in such pointwise features, we further derive the statistical information by “framing” the segments. In each segment, we put every () neighboring points into a frame with a shift of , and then calculate the mean, minimum, maximum, 25%, 50%, and 75% quartiles, and standard deviation of the basic features in each frame, totally seven statistics. Such frame level statistical features can be regarded as a more stable representation of the basic features in every time period of length . The resulting statistical feature matrix ( rows representing the driving feature axis and columns representing the time axis) will serve as input to deep neural networks.
In summary, we use a “large window” to segment a GPS sequence into fixedlength “patches” to model the interdependency among the instantaneous features. Meanwhile, we employ a “small window” to further enframe each segment so as to describe the driving features from the statistical perspectives in a short time period. Such a doublewindowed feature matrix design not only encodes the instantaneous driving behaviors, but also conveys how the patterns change over time. Importantly, only lowlevel movement statistics reflecting the driving behaviors are calculated here and no explicit temporal combination is modeled yet. We expect deep neural networks to learn and extract higher levels of driving style features from such transformed inputs.
Figure 1 illustrates the proposed data transformation for GPS data sequence from the original geospatial domain to the movement statistics domain. A long GPS trip thus will be transformed into a number of statistical feature matrices, each corresponds to a segment of the trip. The label of the original trip (e.g., driver ID) will be assigned to all the segments for supervised learning. An example of the data transformation is shown in Figure 2. In the example, the GPS data sampling rate is 1Hz, and the location coordinates (in meters) are anonymized with trip origin set to . The generated statistical feature matrix sized corresponds to a trip segment of and . In the feature matrix, the order of rows follows our previous introduction: the speed norm related statistics are in the first seven rows, then follow the difference of speed norm related statistics, and so on. For each basic feature, the order of the seven statistics are also in the same order as in which we listed the statistical features. We can see that the feature matrix in Figure 2(c) clearly shows the sharp turn at the beginning of the trip (see the last seven rows 29–35, indicating large values of angular speed statistics) and the speed is getting higher over time (see the first seven rows 1–7).
2.2 Learning with Convolutional Neural Network: Using 1D Convolution And Pooling
Convolutional neural network (CNN) [17, 13] has become popular for image recognitions. It typically consists of alternating convolution and pooling layers. Its two main characteristics, locality and weight sharing, are also beneficial to learning features from time series data such as audio and speech [18, 1]. We will first employ CNN for learning driving styles from the transformed feature matrix defined in Section 2.1.
Given the statistical feature matrix data as inputs, we propose to apply 1D convolutions only over the time axis (columns) because the convolution over driving features has no practical significance. This is because unlike the frequency axis in speech feature maps where there is local structure from low frequencies to high frequencies in a continuous domain, driving features do not have ordered information and are discrete. Exchanging the orders of features (i.e., rows) in the statistical feature matrix yields exactly the same input semantically. In other words, there is no meaningful local structure along the feature axis (rows), thus convolution is not meaningful either. Similar ideas have been proposed for audio classification [18], where the timeaxis convolution helps to learn more effective acoustic features such as phoneme and gender. In our problem, it maps to learning complex driving behaviors. Intuitively, the lower convolution layer in the CNN is able to detect finegrained driving behaviors such as aggressive accelerations, while the higher layer has the ability to find more abstract and semantic level driving patterns. The pooling layer is another significant component of CNN. Similar to the convolution layers, we propose to apply 1D maxpooling only over the time axis. The feature values computed at different time points are pooled and represented by the maximum. It helps realize translation invariant for driving patterns over time. An illustration of the 1D convolution and pooling are shown in Figure 3.
The CNN architecture we build for the driver identification problem is as follows. The net has six layers in total. The first three are convolutionpooling layers and the remaining three are fullyconnected. Specifically, assuming the number of frames in each segment is 128, then the first layer filters the input data with 32 kernels of with a stride of 1 frame. The second convolutional layer takes as input the pooled output of the first convolutional layer where the pool size is and filters it with 64 kernels of size . The third convolutional layer also has 64 kernels of size connected to the pooled outputs of the second convolutional layer. Then the fourth and fifth layer are fully connected and have 128 neurons each. Sigmoid activations are applied to the output of every layers. The last layer is a Softmax, which outputs a distribution over driver IDs, i.e., the class labels of trip segments.
2.3 Learning with Recurrent Neural Network
Recurrent neural network (RNN) is another popular deep neural network that has many variations such as Elman’s RNN [5], LSTM [9] and Bidirectional RNN [26]. It is a kind of feedforward neural network augmented by the inclusion of edges that span adjacent time steps, introducing a notion of time to the model [21]. Given an input sequence , each neuron in the recurrent hidden layer receives input from the current data point and also from hidden node values in the previous time step:
where is the inputhidden weight matrix, is the matrix of weights between the hidden layer and itself at adjacent time steps, denote bias vectors, and is the hidden layer function. RNN can be interpreted as an unfolded network across time steps, therefore, it is inherently deep in time. Since it is very successful on sequence learning tasks such as speech recognition and machine translation, it is natural to extend RNN to driving style feature learning from GPS data sequences. In our case, we regard the transformed statistical feature matrix as a sequence of 35D frames and feed it into RNN. Figure 4 illustrates how RNN runs on the transformed feature matrix.
As analyzed in [23], training RNN is difficult due to vanishing and exploding gradients. Researchers have been working on developing optimization techniques and network architectures to solve this problem. Instead of using more sophisticated methods, Le, Jaitly, and Hinton [16] proposed an RNN architecture composed of rectified linear units (ReLU) and used the identity matrix or its scaled version to initialize the recurrent weight matrix. Their simple solution is even comparable to a standard implementation of LSTM on certain tasks such as language modeling and speech recognition. For our driver identification task, we tend to use such a simple yet powerful network, which we denote as IRNN. The last layer of IRNN, again, is a Softmax appending to the recurrent layer. We can also construct an IRNN with two stacked recurrent layers, denoted by StackedIRNN. The output of the first recurrent layer is also a sequence, which connects to the second recurrent layer as input. The second recurrent layer outputs a hiddenlayer feature vector, which is appended by a Softmax as the last layer. In the next section, we will see that this allows higher level driving feature extraction and leads to better classification performance.
2.4 Nondeeplearning Baselines
For comparisons with the proposed deep learning approach, we also propose two nondeeplearning methods as baselines of traditional machine learning approaches for driving style feature learning. The Gradient Boosting Decision Tree (GBDT) [6] has been recognized as one of the most powerful machine learning algorithms. We use it as the first baseline method with feeding the same transformed GPS data as inputs. But different from CNN and RNN, GBDT does not explicitly model the locality or the time steps in sequence. Therefore, given the same feature matrix as input, GBDT treats it as an unfolded vector of features.
As the second baseline representing traditional feature engineering methodology for characterizing driving styles, we also train GBDT on a set of 57 manually defined driving behavior features. We denote it as TripGBDT. These features were used in a participation to the Kaggle competition on Driver Telematics Analysis [12] (the same dataset will be used in experimental studies in the next section) and achieved 0.92 AUC score in detecting false trips in the competition, indicating the effectiveness of these features.
The 57 features consist of global features and local ones. The global features are trip level statistics, including the mean, min, max, std, and quantiles (25%, 50% and 75%) of speed norm, difference of speed norm, acceleration norm, difference of acceleration norm, and angular speed of a whole trip. In addition, the following ones are also defined as global features: time duration of the whole trip, trip length, average speed (trip length divided by time duration), area of the minimal rectangle containing the trip shape, and lengths of the two edges of the minimal rectangle. The local features are defined as follows. We first extract the moving angles (0 to 180 degrees) for each point (based on a window of three consecutive points), and divide them into eight bins [0,10), [10,20), [20, 30), [30,45), [45,60), [60,90), [90,120), and [120,180]. In each bin, we also calculate the mean, min, max, std., and quantiles (25%, 50% and 75%) of speed norm, difference of speed norm, acceleration norm, difference of acceleration norm, and angular speed. These features model the correlations between driving behaviors and the road’s local shape. We first downsample the trip with sampling rates 1, 2, 3, 4, and 5 records per second, and then extract the features on the downsampled trips. This can be seen as applying a smoothing procedure. In total, there are 57 features defined. We can see that not only driving related features are included, but also trip geometry and global statistics are available. In contrast, in the proposed deep learning approach, only segment level statistics about shorttime driving behaviors are calculated. The global information of a trip is completely invisible to the neural networks. If in such a case, deep learning methods can still outperform TripGBDT, it is convincing to conclude that deep neural networks are more powerful in characterizing driving styles.
3 Experiments
A major requirement on the data quality for characterizing finegrained driving behaviors is that the GPS data sampling rate must not be too low. In addition, a regular sampling interval is preferred. It can be imagined that a low sampling rate can result in too much information loss, especially the instantaneous car movement cannot be estimated accurately. Our empirical studies revealed that generally when the sampling rate is lower than 0.1Hz (one record per ten seconds), the performance of any approaches can become poor. We adopt a large public dataset from the Kaggle 2015 competition on Driver Telematics Analysis [12] for experimental studies. The dataset contains 547,200 anonymized real trips of 2,736 drivers. Each driver has 200 driving trips with varying lengths that record the car’s position (in meters) every second^{1}^{1}1The original problem in the competition was to detect trips that are not driven by a specific driver. Such “false” trips exist in every driver’s data, but there is only a small and random number of false trips. The ground truth of the trips are not available. Therefore, we regard the driver labels as true labels, and regard the false trips as noise, which does not affect the evaluation much. . We use to generate the statistical feature matrices from the trip data.To the best of our knowledge, this dataset is the only publicly available trip dataset having (1) a sufficiently high sampling rate, (2) a regular sampling interval, and (3) a large number of real trips and drivers. As a result, we are only able to experiment on this one dataset. We conduct two experiments: In the first small scale test, we use 50 drivers’ data. In the second large scale test, we use 1,000 drivers’ data. In all the tests, for each driver we randomly select 80% trips as train data and 20% as test. Note that since we segment each trip into fixedlength segments and transform them into the statistical feature matrices, both training and testing are performed on segment data instead of trips. As for the final evaluation, we care not only the segmentlevel predictions but also the triplevel predictions. Once the prediction of each segment of a trip is obtained, the triplevel prediction is calculated through adding up all segments’ predictions for a weighted vote. As the driver identification is a classification problem, we will report segmentlevel accuracy, trip accuracy, and trip top5 accuracy in experiments.
3.1 Candidate Methods And Parameter Settings
Using the same input data, we train five deep neural networks for comparisons:

CNN: see Section 2.2

NoPoolCNN: CNN without pooling layers

IRNN: see Section 2.3, with 100 neurons in the recurrent layer

PretrainIRNN: Use the features extracted at the third convolutional layer in the pretrained CNN as inputs to train an IRNN

StackedIRNN: see Section 2.3, with 100 neurons in each recurrent layer
In addition, we include GBDT and TripGBDT introduced in Section 2.4 in comparisons.
The parameters of algorithms are tuned using the standard 5fold crossvalidation. We use batch size 128 for training the neural networks. For CNNs, we use the stochastic gradient descent optimizer with learning rate 0.05, decay 1e6, and Nesterov momentum 0.9. For RNNs, we use the RMSProp optimizer with learning rate 1e6, =0.9, and =1e6. For GBDT and TripGBDT, the max tree depth 6 is used in the 50driver experiments and the stopping iterations leading to the best performance are chosen. In the 1000driver experiments, the max tree depth 20 is used instead for TripGBDT.
3.2 Experiment on 50 Drivers’ Data
The training dataset constructed from the first 50 drivers’ trips includes over 35,000 segments, which are greatly augmented compared with the original 8,000 trips. The best results obtained by each algorithm are summarized in Table 1. We can find that IRNN demonstrates strong advantages over the others. And the obtained accuracies are quite acceptable considering that random guess on trip accuracy should be 2% (1/50). Not surprisingly, GBDT performs the worst among all. The best result (bolded in table) is from StackedIRNN. Although with a simpler architecture of just one recurrent layer, IRNN still easily beats the remaining candidates. NoPoolCNN performs worse than CNN, indicating the effectiveness of pooling layers. PretrainIRNN performs better than CNN, however, it still does not outperform IRNN or StackedIRNN that directly run on the feature matrices. But it worth to mention that IRNN’s training time is much longer than CNN and GBDT. The more complex StackedIRNN only exhibits small improvement over the single layer IRNN, while the training time cost is nearly doubled. For TripGBDT, only trip level accuracies are available because there is no segment level training or testing. Although the trip level global information is provided in addition to the driving related features, TripGBDT cannot perform as good as IRNN and StackedIRNN but better than CNN on this small scale test. In general, deep neural networks are capable of learning good driving style representations from the transformed feature matrix and can perform better than traditional methods.
Method  Seg (%)  Trip (%)  Trip Top5 (%) 

NoPoolCNN  16.9  28.3  56.7 
CNN  21.6  34.9  63.7 
PretrainIRNN  28.2  44.6  70.4 
IRNN  34.7  49.7  76.9 
StackedIRNN  34.8  52.3  77.4 
GBDT  18.3  29.1  55.9 
TripGBDT    51.2  74.3 
3.3 Experiment on 1000 Drivers’ Data
We further conduct a larger scale test using the first 1000 drivers’ trip data. We only include CNN, StackedIRNN, and TripGBDT in comparison since they are representative methods of their own categories as shown in Table 1. The neural network parameters keep unchanged. The results are reported in Table 2. We can see that deep neural networks exhibit significantly better scalability: the performance does not decrease much considering the problem becomes a harder 1000class problem (random guess accuracy 0.1%). Contrarily, TripGBDT’s performance becomes dramatically worse. This indicates that if the problem becomes harder, the manually defined features are no longer as powerful as the ones learned by the deep neural networks. Still, StackedIRNN performs the best. But again, it cost a lot more computational time to converge.
Method  Seg (%)  Trip (%)  Trip Top5 (%) 

CNN  23.4  26.7  46.7 
StackedIRNN  27.5  40.5  60.4 
TripGBDT    9.2  15.8 
3.4 Interpretation of Learned Features
It is interesting to investigate what kind of features are learned by the deep networks. Here we report that if looking at the second recurrent layer of the StackedIRNN trained from the 1000 drivers’ data, among all the training samples, which ones result in the maximum activations on the 100 hidden neurons. For each neuron, we visualize the training samples (segments) that lead to the top five activations. By observing the common patterns among the samples, we can analyze what kinds of features have been learned.
In Figure 5, we show the results of three selected neurons. Each column illustrates the top five training samples that mostly activated a selected neuron. Interestingly, the neurons seem to have learned driving behaviors such as slowdown at hard turns, high speed driving along straight roads, and even those GPS failures that caused sudden huge jumps, i.e., outliers. These learned features are fairly interpretable, which demonstrates the power of the proposed deep learning approach on driving style feature learning and partially explains its good performance.
4 Application Related Discussions
In application scenarios requiring realtime prediction, the proposed deep learning approach has a significant advantage over the traditional methods such as TripGBDT that rely on trip level features. Because only segment level data is needed, given a pretrained network, the deep learning approach can be used in realtime prediction where segment data can be available online as the car moves. In contrast, trip level features such as trip length and trip time duration can only be available after the trip ends. This restricts traditional methods such as TripGBDT to be used for online prediction purposes, whereas the deep learning approaches are far more flexible. It is possible to build an online system based on the proposed deep learning approach, e.g., to predict the driver identity based on the data of a partial trip collected during runtime. The prediction can be dynamic over time, which is based on the aggregation of predictions on all the sofar collected trip segments.
Privacy is often a key concern in analyzing telematics data. The proposed data transformation in Section 2.1 has a merit of not revealing any location or time specific information to the learning phase, even if the GPS trip is not anonymized. This is because the basic features only describe movements with relative location and time information. In a real system, if the data transformation can be done, e.g., on the vehicle side, data privacy can be well preserved even if the learning is performed in a centralized manner such as on the cloud side, which requires data to be uploaded for analyses. This makes it easier to deploy in the realworld.
Driving contexts, e.g., road level, road shape, traffic, and weather, can also influence driving behaviors. Additional car sensor data, such as OBD (OnBoard Diagnostic) monitoring the vehicle status, are also helpful to model driving behaviors. Such contextual and sensor data inputs can be further plugged into our framework to enrich the statistical feature matrix. As long as the data are in the format of sequences or time series, similar transformation can be applied so that the calibrated inputs to deep neural networks can encode richer information. Deep neural networks should be able to discover the discriminative correlations among the feature matrix’s elements and learn even better driving style representations.
5 Related Work
Most existing methods in the literature on driving style modeling rely on a humandefined driving behavior feature set, which consists of handcrafted vehicle movement features derived from sensor data [15, 20]. These features typically work with traditional machine learning methods (supervised classification, unsupervised clustering, or reinforcement learning) to solve problems such as driver identification/classification, driver performance assessment, and human driving style learning [22, 25, 24, 27, 19, 14]. However, as discussed in the introduction, designing the best driving style descriptor is often challenging even for experienced domain experts. Taking the driver identification problem as examples, the number of classes (i.e., distinct drivers) in the literature is mostly less than ten, indicating the difficulty of developing discriminative feature definitions. In contrast, our proposed deep learning approach directly work on raw GPS data and automatically learn driving style features, which requires little human work on feature engineering. Even if the problem size grows to 1000class as in the experiments, the classification performance is still far better than traditional methods.
There are a lot recent work on deep learning for autonomous driving or Advanced Driver Assistance Systems (ADAS) using camera sensor data as inputs, e.g., [10, 3, 11]. However, the primary purpose of these studies is not to characterize human driver’s driving styles. More importantly, unlike our approach, they are technically about solving computer vision problems rather than learning from GPS records.
6 Conclusion
In this paper, we proposed a deep learning approach for characterizing driving styles, which could be the first attempt of extending deep learning to driving style feature learning directly from GPS data. First, we proposed a data transformation method to construct an easily consumable input form (the statistical feature matrix) from raw GPS sequences for deep learning. Second, we developed several deep neural network architectures including CNNs with using 1D convolution and pooling and RNNs and studied their performance on learning a good representation of driving styles from the transformed data inputs. Taking the driver identification as a sample task, experiments on a large real dataset showed that the proposed deep learning approach significantly outperforms traditional machine learning methods as well as the stateoftheart feature engineering methods that mostly rely on handcrafted driving behavior features. Furthermore, the driving style features learned by the deep neural networks were fairly interpretable, explaining the effectiveness of the proposed deep learning approach. In a word, deep learning can be a powerful tool for learning driving style features from GPS data and many other driving behavior analysis problems.
References
 [1] Ossama AbdelHamid, Abdelrahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. Convolutional neural networks for speech recognition. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(10):1533–1545, 2014.
 [2] Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai. Classbased ngram models of natural language. Computational linguistics, 18(4):467–479, 1992.
 [3] Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pages 2722–2730, 2015.
 [4] Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160–167. ACM, 2008.
 [5] Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.
 [6] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
 [7] Alan Graves, Abdelrahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 6645–6649. IEEE, 2013.
 [8] Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. Deepspeech: Scaling up endtoend speech recognition. arXiv preprint arXiv:1412.5567, 2014.
 [9] Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural computation, 9(8):1735–1780, 1997.
 [10] Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migimatsu, Royce ChengYue, et al. An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716, 2015.
 [11] Ashesh Jain, Hema S Koppula, Shane Soh, Bharad Raghavan, Avi Singh, and Ashutosh Saxena. Brain4cars: Car that knows before you do via sensoryfusion deep learning architecture. arXiv preprint arXiv:1601.00740, 2016.
 [12] Kaggle. Driver Telematics Analysis. www.kaggle.com/c/axadrivertelematicsanalysis/data, 2015. [Online; accessed 14Jan2015].
 [13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
 [14] Markus Kuderer, Shilpa Gulati, and Wolfram Burgard. Learning driving styles for autonomous vehicles from demonstration. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 2641–2646. IEEE, 2015.
 [15] Alex Laurie. Telematics: the new auto insurance. Towers Watson, 2011.
 [16] Quoc V Le, Navdeep Jaitly, and Geoffrey E Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941, 2015.
 [17] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [18] Honglak Lee, Peter Pham, Yan Largman, and Andrew Y Ng. Unsupervised feature learning for audio classification using convolutional deep belief networks. In Advances in neural information processing systems, pages 1096–1104, 2009.
 [19] Z Li and Chen Cai. Unsupervised detection of drivers¡¯ behavior patterns. In Australasian Transport Research Forum (ATRF), 37th, 2015, Sydney, New South Wales, Australia, 2015.
 [20] Na Lin, Changfu Zong, Masayoshi Tomizuka, Pan Song, Zexing Zhang, and Gang Li. An overview on study of identification of driver behavior characteristics for automotive control. Mathematical Problems in Engineering, 2014, 2014.
 [21] Zachary C Lipton. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019, 2015.
 [22] José Oñate López, Andrés C Cuervo Pinilla, et al. Driver behavior classification model based on an intelligent driving diagnosis system. In 2012 15th International IEEE Conference on Intelligent Transportation Systems, pages 894–899. IEEE, 2012.
 [23] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. arXiv preprint arXiv:1211.5063, 2012.
 [24] Zhan Fan Quek and Eldwin Ng. Driver identification by driving style. Technical report, technical report in CS 229 Project, Stanford university, 2013.
 [25] C. G. Quintero M., José Oñate López, and Andrés C Cuervo Pinilla. Driver behavior classification model based on an intelligent driving diagnosis system. In 2012 15th International IEEE Conference on Intelligent Transportation Systems, pages 894–899, Sept 2012.
 [26] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681, 1997.
 [27] Minh Van Ly, Sujitha Martin, and Mohan M Trivedi. Driver classification and driving style recognition using inertial sensors. In Intelligent Vehicles Symposium (IV), 2013 IEEE, pages 1040–1045. IEEE, 2013.