PRESS: A Novel Framework of Trajectory Compression in Road Networks

PRESS: A Novel Framework of Trajectory Compression in Road Networks

Renchu Song   Weiwei Sun   Baihua Zheng  Yu Zheng
Fudan University, Shanghai, China, {songrenchu, wwsun}@fudan.edu.cn Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China Singapore Management University, Singapore, bhzheng@smu.edu.sg Microsoft Research, Beijing, China, yuzheng@microsoft.com
Abstract

Location data becomes more and more important. In this paper, we focus on the trajectory data, and propose a new framework, namely PRESS (Paralleled Road-Network-Based Trajectory Compression), to effectively compress trajectory data under road network constraints. Different from existing work, PRESS proposes a novel representation for trajectories to separate the spatial representation of a trajectory from the temporal representation, and proposes a Hybrid Spatial Compression (HSC) algorithm and error Bounded Temporal Compression (BTC) algorithm to compress the spatial and temporal information of trajectories respectively. PRESS also supports common spatial-temporal queries without fully decompressing the data. Through an extensive experimental study on real trajectory dataset, PRESS significantly outperforms existing approaches in terms of saving storage cost of trajectory data with bounded errors.

1 Introduction

The advance in location-acquisition technologies has led to a huge volume of spatial trajectories, e.g., the GPS trajectories of vehicles, each of which is comprised of a sequence of time-ordered spatial points. As the trajectories are in huge volume and some points in a trajectory are redundant, application systems on trajectories have to bear high communication loads and expensive data storage. This is calling for trajectory compression technologies that can reduce the storage cost while keeping the utility of a trajectory. In this paper, we propose a trajectory compression framework under the road network constraints, namely PRESS (Paralleled Road-Network-Based Trajectory Compression). The main objective is to achieve a spatial lossless and temporal error-bounded compression, and meanwhile provide support to popular LBS applications.

The PRESS framework has five components, namely map matcher, trajectory re-formatter, spatial compressor, temporal compressor, and query processor, as shown in Fig. 1. Taking raw GPS trajectories as input, map matcher maps each trajectory into a sequence of edges in the road network, which will be reformatted into a spatial path and a temporal sequence via trajectory re-formatter. Thereafter, the compression takes place in parallel. The spatial path is compressed by spatial compressor based on Hybrid Spatial Compression (HSC) algorithm; and the temporal sequence is compressed by temporal compressor based on Bounded Temporal Compression (BTC) algorithm. The compressed spatial path and compressed temporal sequence are then passed to query processor to support different application needs.

Figure 1: PRESS framework

Different from existing works, we consider both the compression ratio and the utility of the compressed trajectories. In general, the higher the compression ratio, the lower the quality of the compressed trajectory, which directly affects data utility. Consequently, it is challenging to propose a novel approach to achieve a high compression ratio with high quality compressed trajectories, especially under road network constraints.

PRESS tackles this issue from three different angles. First, it observes that the spatial path and the temporal information of a trajectory have different features and hence it strategically separates the spatial path from the temporal information when presenting a trajectory. The clear separation allows us to compress the spatial path and the temporal information separately. Second, a lossless spatial compression algorithm HSC is proposed to effectively compress the spatial path using significantly less space without losing any spatial information. It has two stages. The first stage compression is based on shortest paths. Given a sub-trajectory from edges to , if is exactly the same as the shortest path from to , will be replaced by (, ). As in many cases we tend to take shortest paths in real life, this compression can effectively reduce the number of edges we have to maintain for each trajectory. The second stage compression is based on frequent sub-trajectory (FST) coding. The main idea is to decompose a trajectory into a sequence of FSTs, each of which is represented by a unique code (e.g., Huffman code). The more popular the FST, the shorter the corresponding code and the more the space savings. Meanwhile, PRESS designs a temporal compression algorithm BTC to compress temporal information with bounded errors. BTC is very flexible, and it can compress the temporal information based on the error bounds specified by different applications. As a summary, the lossless nature of the spatial compression and the error-bounded nature of the temporal compression guarantee the high quality of the compressed trajectories. Last but not the least, PRESS also supports many popular spatial-temporal queries commonly used in location-based services (LBSs) such as , and queries without fully recovering the compressed trajectories.

An extensive experimental study has been conducted on a real trajectory dataset to validate the effectiveness and efficiency of PRESS. According to the results, PRESS can save up to of the original storage cost. Let be the length of a trajectory . Both HSC and BTC have the compression time complexity of , and hence the compression time complexity of PRESS is . As compressed temporal sequences share the same format as original ones, BTC does not require any decompression process. In other words, the decompression time complexity of PRESS is equal to that of HSC, i.e., . In addition, PRESS can significantly accelerate spatial-temporal queries. In brief, PRESS outperforms the state-of-the-art approaches in terms of the compression ratio, the time consumption and the acceleration of spatial-temporal queries.

The rest of the paper is organized as follows. Section 2 presents our new approach to represent a trajectory. Section 3 and Section 4 introduce the detailed spatial compression and temporal compression respectively. Section 5 explains how to support some common queries via compressed trajectories. Section 6 presents our experimental studies. Section 7 reviews related work. Finally, Section 8 concludes this paper with some directions for future work.

2 Trajectory Representation

In our work, a road network is defined as a directed graph , where is the vertex set and is the edge set. The weight on an edge , denoted as , can be physical distance, travel time or other costs according to different application context. A trajectory is the path that a moving object follows through space as a function of time. Consequently, it contains both spatial information and temporal information. Traditional approaches represent trajectories via a sequence of triples in the form of , , , , where is the position in the 2D Euclidean space at time stamp .

We propose a different representation of trajectories in the road network. Instead of combining positions and time stamps together like existing approaches do, we separate the locations from time stamps. In other words, a trajectory is represented by a spatial path and a temporal sequence. This clear separation enables us to design different compression approaches for spatial information and temporal information respectively, so that both spatial compression and temporal compression can achieve high compression effectiveness without constraining each other. In the following, we will explain how to represent the spatial information and temporal information via spatial path and temporal sequence, respectively.

The spatial path of a trajectory in a road network is a sequence of consecutive edges. As shown in Fig. 2, a trajectory sequentially passes edges , , , , and . Consequently, it can be represented by a spatial path, in the format of . Note trajectories can start from and/or end at any point of an edge, not necessarily an endpoint. For example, the example path ends at a point along edge . We will tackle this issue via the temporal sequence presented in the following.

Figure 2: Sample trajectory in a road network

The temporal information of a trajectory defines the time when an object locates at a specific location. For example, the triple used in the traditional representation tells that the object is located at position at time stamp . However, this representation does not facilitate the spatial queries in road networks. Consider a common query that asks for the average moving speed of an object during a period [] with . Positions and do not capture any distance information and we have to explore the road network to calculate the distance traveled by from to . Consequently, we propose to use the tuple to capture the temporal information. To simplify the discussion, in this paper represents the network distance the object has traveled at the time stamp since the start of the trajectory. More generally, can represent other weight information of the edges, e.g., travel time or other costs based on application needs.

Back to the example trajectory shown in Fig. 2. There are five time stamps denoted as , , , , and , respectively. Based on our newly proposed temporal sequence representation, the temporal information of our example trajectory will be represented by five tuples. They are , , , , and , as shown in Fig. 3(a). The first tuple means the object starts the trajectory at time stamp and the corresponding distance it has traveled since start is zero (i.e., ), the second tuple means the object has traveled distance at time stamp with representing the distance of edge , and so on. Note the last tuple does not locate at any endpoint, and the same can happen to the first tuple too.

Although we are not the first one to represent the spatial path of a trajectory via edges, we want to highlight that our approach to separating the spatial information from the temporal information when representing trajectories is very unique. Former approach [10] uses the vertices in a road network (i.e., the endpoints of edges) to capture the spatial information of trajectories, together with the time when the object passes those vertices. However, the time stamps when the object passes those vertices do not cover the entire temporal information. For example, a taxi might stop for a long time somewhere between two vertices. By retaining two time stamps of the vertices, we can only assume that the taxi drives at a low uniform speed on the edge, which is not the real case. Our approach can easily tackle this issue. As illustrated in Fig. 3(b), we understand that the taxi moves slowly from to , gets stuck from to , and then moves slowly again from to .

Figure 3: Temporal sequence

3 Spatial Compression

After presenting the formal representation of trajectories, we are ready to present Hybrid Spatial Compression (HSC), the two-stage spatial compression algorithm. It takes an initial spatial trajectory as an input, and performs shortest path compression on the first stage and then frequent sub-trajectory compression on the second stage. Existing works use original sampled positions to keep track of a trajectory, and propose to use () positions for capturing the trajectory in order to cut down the storage cost. However, all the existing approaches based on this idea cannot fully capture the spatial path traveled by the trajectory. Consider our sample trajectory. Its spatial path can be represented by six vertices, i.e., , , , , , and . If we reduce the vertex number from original six to three and represent the trajectory by , , and , we can only tell that in this trajectory, object moves from to via vertex but we cannot tell how the object moves from to , and then from to . Consider the movement from to , the object could take path (, , , ), or (, , , ), or (, , , ), which is uncertain.

Although existing works propose various metrics to guarantee the similarity between compressed trajectories and original ones, none is error-free. They trade in the accuracy of trajectories’ spatial information for the saving of storage cost. Alternatively, our two-stage HSC approach is error-free. The compressed trajectory returned by HSC, although taking less space, captures the spatial path of the original trajectory as it is. In HSC, we make two assumptions. i) Objects tend to take the shortest path instead of longer ones in most if not all cases; and ii) the trajectories are not uniformly distributed in the road network and there are certain edge sequences which are passed through more frequently.

3.1 Shortest path compression

Given a source and a destination , most of the time we will take the shortest path (SP) between and if all the edges roughly share the similar traffic condition. Under this assumption, we can predict that most, if not all, of the trajectories consist of a sequence of shortest paths. Our first compression is motivated by this observation, and takes full advantage of shortest paths.

We assume that all-pair shortest path information is available via a pre-processing of the road network. This can be achieved by any of the well known shortest path algorithms. If there are several shortest paths between a pair of edges, we only record one of them to eliminate any ambiguity during compression. We assume denotes the shortest path from edge to edge , and maintain a structure recording the last edge (the edge right before ) of for each pair of edges. Take the partial road network shown in Fig. 4 as an example. Assume the number in the middle of each edge indicates the network distance of the edge, then , , , and so on.

Figure 4: Example of shortest path compression

Before we present the detailed algorithm, we use a running example to illustrate the SP compression. The main idea is to skip the detailed sub-trajectory if it matches exactly the shortest path from to , i.e., replacing with and only. As shown in Fig. 4, the original trajectory . Initially, the SP compression algorithm enrolls the first edge into . Thereafter, it scans the subsequent edges one by one. For the second edge , and are adjacent and the process continues. For the third edge , , so edge can be skipped. For the fourth edge , and hence is also skipped. Next, , so is skipped. Finally, the algorithm enrolls the last edge into to finish the process and replaces with . As it scans each edge in once, its complexity is .

The pseudo code is listed in Algorithm 1. It takes a trajectory as an input. It enrolls the first edge into the compressed trajectory , uses a pointer to record the tail edge of , and then sequentially scans the remaining edges . If the trajectory matches exactly the shortest path from to , there is no need to record into . After scanning until , the algorithm terminates by returning . The main idea of SP compression is to replace the shortest path between two pair of edges with those two edges, and there are multiple ways to implement it. The SP algorithm proposed in this work is based on greedy algorithm, and it actually generates the largest compression ratio during shortest path compression, as stated in Theorem 1.

Input: a road network , a trajectory ;
Output: a compressed trajectory ;
Procedure:

1:  ; ;
2:  for  to  do
3:     if  then
4:         , ;
5:  return ;
Algorithm 1 Shortest Path Compression
Theorem 1

The greedy algorithm is the optimal algorithm resulting in the largest compression ratio during SP compression.

Proof.

Assume the input trajectory . Now we prove that our greedy SP compression actually generates the optimal solution in terms of the number of edges by induction on , the length of the compressed trajectory.

When , . If the output is not optimal, there must be a compressed trajectory with length . As the starting edge and the ending edge of must be preserved, . In other words, passes exactly the shortest path from to . Based on SP compression, it will not output if . Our statement is true when .

Now, we assume the statement is true for all and let be a compressed trajectory of length returned by our SP compression in the form of . If our statement is not true, there must be another compressed trajectory with length . If and share at least one common edge in addition to and (i.e., with ), we can decompose into and . As and , they both must be optimal solutions and our assumption that is not true. Otherwise, and do not share any common edge except and . Without loss of generality, we can decompose into and such that there is an edge and locates after but before in the original trajectory . Accordingly, can be decomposed into two sub-trajectories by with and . On the other hand, the original trajectory can be decomposed into two sub-trajectories and at edge with , and . For , the compressed trajectory returned by the greedy algorithm must be followed by , and represents another possible compressed form returned by other SP compression. As , it is guaranteed that . For , we are certain the compressed form returned by PRESS will be but started with not , as guaranteed by the SP-containment property; and as , it is guaranteed that . That is to say, . As and , we have which contradicts our assumption that . Our assumption is invalid and the proof completes.

The decompression process is straightforward. Given a compressed trajectory , we sequentially scan each pair of edges (, ). If they are not adjacent, we complement the trajectory with the shortest path . As is the only pair and is not adjacent to , we complement with , . In order to obtain , we only need to visit , and so on. This step takes as many times as the length of the shortest path and hence the time complexity of the decompression process is also .

3.2 Frequent sub-trajectory compression

As we assume previously, trajectories are not evenly distributed within the road network and edges in a road network are not accessed uniformly. In other words, certain edge sequences are much more popular than others in terms of frequency. If we are able to locate the very popular sub-trajectories, named frequent sub-trajectory (FST), then we can use certain coding scheme to compress them and to replace them in the trajectories with the corresponding codes.

Given a large set of trajectory data, the concept of FST makes sense. Consequently, the compression based on FST is not effective if the underlying dataset is small. In addition, we also assume the trajectory dataset is periodical. That means if we collect all the trajectories of all the cars moving within a city for a duration of several months, the dataset of one day should be similar to the dataset of another day. Under this assumption, we can locate FSTs based on a subset of the complete trajectory dataset, which corresponds to the training process in data mining. Note that the input training dataset is a subset of the complete trajectory dataset after the SP compression. For example in our experiments, we take the trajectories corresponding to one day as a training dataset, perform SP compression for each trajectory in the training dataset, and then pass them to the second stage for FST mining. In the following, we explain how to mine FSTs, how to decompose a trajectory based on the mined FSTs, and the detailed decoding process, the three main steps of FST compression.

3.2.1 FSTs mining

As we want to compress the trajectories based on FSTs, we have to locate all the FSTs first. The problem of mining FSTs is similar to the frequent pattern mining problem [6, 14, 24] in data mining. In this work, we propose a novel approach to locate FSTs. We treat sub-trajectories as strings and use Huffman coding [8] to compress them. The more frequent a sub-trajectory is (based on its frequency in the training set), the shorter the corresponding code is and hence more savings in terms of storage cost are expected when compressing the trajectories. The basic idea is to first build a Trie [9] based on the training set to represent sub-trajectories as strings, next form an Aho-Corasick automaton [1] to enable a decomposition of a trajectory into a set of sub-trajectories, and then use Huffman coding to compress the trajectories.

In order to facilitate the understanding of our approach, we assume that an input set returned by the first-stage SP compression has three compressed trajectories, i.e., , , and shown in Fig. 5. Theoretically, we can locate all the sub-trajectories with length ranging from minimum 1 to maximum trajectory length (e.g., ranging from 1 to 6 in our example). However, we set a threshold to only consider the sub-trajectories with their length not exceeding . Take the real dataset used in our simulation as an example. On training dataset, by setting as any number from 1 to 20, our approach is able to save around to storage consumption but the time complexity of our approach is proportional to . Although we cannot formally find an optimal setting for as it is highly dependent on the training dataset and real trajectory dataset, a small value of already can achieve significant storage saving with reasonable time complexity. In the following discussion, we set , which is the optimal length for our trajectory dataset.

Figure 5: Example Trie

For the given input dataset , we first locate all the sub-trajectories with length not exceeding (i.e., 3). Note we locate one sub-trajectory starting from each edge, so those sub-trajectories near the tail of each trajectory may be shorter than . As illustrated in Fig. 5, they are , , , , , , , , , , , , , , and . We then build a Trie based on all identified sub-trajectories. For any node in the Trie, the path from root to represents a sub-trajectory with the number shown in the link from its parent node indicating the frequency of . Here, the number next to each node is the unique ID for the node. Take node 18 as an example. The string formed by the nodes along the path from root to node 18 is , i.e., corresponding to the sub-trajectory . The number 1 shown in the link from node 16 to node 18 represents the frequency of , i.e., it only appears once in the training dataset. In addition, we want to make sure that the nodes in the first level (the level right below ) correspond to all the edges in the original road network. This design is to facilitate the later decomposition process which will be explained later. Take our sample Trie as an example. Assume our original road network consists of 10 edges (i.e., ), only edges , , , , , , and present as the first edge in the located sub-trajectories. Consequently, we add the rest edges (i.e., , ,and ) to the first level with the corresponding frequency set to zero, as shown in Fig. 5.

3.2.2 Trajectory decomposition

Once FSTs are identified and the Trie is constructed, we need to decompose an input trajectory into a set of identified FSTs. We borrow the basic idea from Aho-Corasick string matching algorithm. Informally, the algorithm constructs a finite state machine that resembles a trie with additional links between the various internal nodes. The automaton is depicted in Fig. 6, with all the extra links represented by dashed lines. To be more specific, each extra link issued from a node to another node that is the longest possible suffix of the string corresponding to . For example, for node 15 (), its suffixes are () and (). The longest of these that exists in our example is (), i.e., node 16. That is why the extra link issued from node 15 points to node 16.

Figure 6: Aho-Corasick automaton

Now we explain how to decompose an input trajectory into a sequence of identified sub-trajectories. It treats the trajectory as a string, and scans the characters (i.e., edges) one by one sequentially. At each step, the current node is extended by finding its children, and if none of the children matches the character, finding its suffix’s children, and if that does not work, finding its suffix’s suffix’s children, and so on, finally ending in the root node if nothing has seen before.

Input: Aho-Corasick automation , a compressed trajectory ;
Output: a sequence of sub-trajectories;
Procedure:

1:  ; ; ; ; ;
2:  while  do
3:     ;
4:     if  then
5:         ; ; ;
6:     else if  then
7:         ;
8:     else
9:         ;
10:  while  do
11:     if  then
12:         , ;
13:     else
14:         ;
15:  return ;
Algorithm 2 Trajectory decomposition

Algorithm 2 lists its pseudo code. First, it initializes all the parameters. Here, indicates the current node of the automaton , indicates the position of the edge in the trajectory currently evaluated, is an auxiliary stack holding all the matched nodes in , and is the result set which consists of a sequence of the sub-trajectories decomposed from the input trajectory . It then scans the edges in one by one sequentially. For each edge , it first checks whether a child of the current node matches via the function . If a match occurs, returns the child node. We then push it to , set it as the current node, and proceed to the next edge by increasing (lines 4-5). If a mismatch occurs indicated by returned by , we continue the checking at ’s suffix node if any via the extra link (lines 6-7) or the node (lines 8-9). Recall that for each edge in the original road network, our automaton has a corresponding node in its first level. Consequently, a match can be definitely achieved for each edge in the input trajectory and our decomposition is converged. After the first WHILE-loop (lines 2-9), stack shall have (=) nodes, with each corresponding to an edge in .

Next, we recover the sub-trajectories represented by the nodes in . Note that the initial Aho-Corasick string matching algorithm is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the “dictionary”) within an input text. As it matches all patterns simultaneously, the returned patterns may have overlaps. However, our purpose for sub-string searching is to decompose the trajectory into a sequence of sub-trajectories and hence each edge in shall present exactly once in one sub-trajectory. The reason that, when a matched node is found, we do not output the corresponding string but maintain it in the stack is to avoid the overlapping among different sub-trajectories. As each node in matches one edge in , our basic idea is to find the longest matched sub-trajectory from each edge backward. In other words, given a node with , the next () nodes in can be ignored. This process is performed by the second WHILE-loop (lines 10-14). Finally, the algorithm returns the sub-trajectories maintained in to complete the decomposition. The time complexity of this decomposition process is .

We use an example to illustrate the trajectory decomposition process. Assume , , . First, for , the first edge in , it finds a match at node 1, and pushes node 1 to with . Second, for , the second edge in , it finds a match at node 16 and pushes node 16 to with . For , the third edge of , it cannot find a match with any child of node 16, and even the child of node 20 (node 16’s suffix node). Consequently, we trace-back to the node and find a match at node 22 and update to . The process repeats until all the edges are processed with . Next, we start the sub-trajectory recovery step by popping out nodes from . First, node 24 is popped out, and is added to . As , it does not skip any other node in . Second, node 10 is popped out and is added to . As its length is three, it skips the next two nodes popped out from , i.e., nodes 2 and 1, but evaluates node 9. It adds to . Again, the next 2 nodes (i.e., nodes 6 and 5) are skipped and we evaluate node 4 which triggers the insertion of . This process also continues until is empty. Finally, . Accordingly, is decomposed into six sub-trajectories, corresponding to nodes 16, 22, 4, 9, 10, and 24, respectively, as shown in Table 1.

3.2.3 Encoding procedure

Finally, we present an encoding procedure which uses Huffman coding to represent the identified FSTs. The main idea is to code each node in Trie. The more frequent a node is, the shorter the code is expected to be. Consequently, we construct a Huffman tree based on all the nodes according to the node frequency, except the root node. Huffman tree is a binary tree. A node can be either a leaf node or an internal node. An internal node contains a weight that is a summation of its child nodes’ weights, and two links to two child nodes. As a common convention, bit ‘0’ represents following the left child and bit ‘1’ represents following the right child. Assume the initial Trie has nodes, a corresponding Huffman tree has up to leaf nodes and internal nodes.

Initially, all nodes are leaf nodes, and the process essentially begins with the leaf nodes containing the frequencies of the Trie nodes they represent. Then, a new node whose children are the two nodes with smallest frequencies is created, such that the new node’s weight is equal to the sum of the children’s weight. With the previous two nodes merged into one node, and with the new node being now considered, the procedure is repeated until only one node remains. For the Trie shown in Fig. 5, the corresponding Huffman tree is depicted in Fig. 7. Here, a rectangle represents a leaf node which corresponds to a node in the Trie, and a circle represents an internal node with the number inside the circle indicating the weight. With the help of Huffman tree, each node of the Trie (i.e., each identified sub-trajectory) can be represented by a unique code. For easy understanding, we list some sample sub-trajectories and their unique codes in Fig. 7. For example, is represented by node 3 in the Trie, and its corresponding code is 00101; is represented by node 16 in the Trie, and its corresponding code is 0111. Based on Huffman coding, the code for the example trajectory is listed in Table 1.

Figure 7: Example Huffman tree
input
decomposition
Trie nodes 16, 22, 4, 9, 10, 24
Huffman code 0111, 01010000, 1111, 01001, 00110, 0101001
Result 011101010000111101001001100101001
Table 1: FST compression of trajectory

3.2.4 Discussion

As a summary, FST compression first locates all the sub-trajectories with their length not exceeding from the training set, and constructs a Trie. It then builds an Aho-Corasick automaton and a Huffman tree based on the Trie. For a given compressed trajectory , it decomposes into a sequence of sub-trajectories with the help of the automaton, and then uses the Huffman codes of the corresponding sub-trajectories as a compressed format to represent . The decoding process is straightforward. Given a binary code, it first recovers the sequence of nodes in Trie represented by the binary code with the help of Huffman tree, and then retrieves the sub-trajectories represented by those Trie nodes to recover the trajectory. The time complexity of the first step is in the scale of the length of the binary code. Given the fact that the binary code has as the upper bound, the first step has as the time complexity. The time consumption of the second step is the length of the SP compression result, which is up bounded by . Consequently, the time complexity of this step is also . Combining two steps, the time complexity of decoding process is .

3.3 Hybrid Spatial Compression (HSC)

HSC takes advantages of above two spatial compression techniques, and is expected to further improve the compression effectiveness. We assume the all-pair shortest path, the Trie, the automaton and the Huffman tree are constructed in advance. Because the compression and decompression time complexity of both SP compression and FST compression is , the compression and decompression time complexity of HSC is .

4 Temporal Compression

We propose to represent the temporal information of a trajectory in the form of . This representation is storage consuming, as it suffers from the same scale as the original GPS sampling number. However, on the other hand, each tuple describes when the object is at a specific location. The compression of this information will cause the loss of certain information. Consequently, we propose two metrics to bound the inaccuracy that could be caused by the temporal compression, namely Time Synchronized Network Distance (TSND) and Network Synchronized Time Difference (NSTD), as formally defined in Definition 1 and Definition 2, respectively. To simplify our discussion, we assume the trajectories mentioned in the following are in the format of . For a given and a given time stamp (), the corresponding distance the object has moved at can be approximated by linear interpolation via function . For example, with returns . Similarly, for a given and a given (), the corresponding time when object moves distance along can be approximated by linear interpolation via function .

4.1 Error metrics

Before we present our Bounded Temporal Compression (BTC) algorithm, we first introduce the error metrics TSND and NSTD in the following.

Definition 1 (Time Syn. Network Dis. (TSND))

Given a trajectory and its compressed one , TSND measures the maximum difference between the distance object travels via trajectory and that via trajectory at any time slot with .

Definition 2 (Network Syn. Time Dif. (NSTD))

NSTD defines the maximum time difference between a trajectory and its compressed form while traveling any same distance with .

To facilitate the understanding of these two metrics, we depict an example in Fig. 8. Given a sequence of temporal tuples , can be plotted on a - plane. Then, TSND measures the maximum difference between and along -dimension, and NSTD measures the maximum difference between and along -dimension. We want to highlight that both TSND and NSTD are meaningful only when the compressed trajectory keeps exactly the same spatial information as the original trajectory , which is guaranteed by our HSC algorithm.

Figure 8: TSND and NSTD

The metric TSND is a variant of Time Synchronized Euclidean Distance (TSED) [16, 20] metric which measures the distance of two Euclidean space trajectories. Given a Euclidean space trajectory and its compressed one , with and representing two Euclidean points, TSED returns the maximum Euclidean distance between a point and a point with and , for any .

Theorem 2

Given an original trajectory and a compressed trajectory , if is compressed via previously introduced HSC algorithm, .

Proof. Given , let in the form of be the corresponding point that the moving object is located along trajectory at time . Similarly, given , let in the form of be the corresponding point that the moving object is located along trajectory at time . As is compressed via HSC algorithm, is exactly the same as in terms of spatial information although takes less space to keep the spatial information than does. Consequently, represents the network distance between and . As we know the Euclidean distance between two points is always the lower bound of the corresponding network distance, we have . Assume at time . As , our statement holds and the proof completes.

4.2 Bounded Temporal Compression

After introducing metrics TSND and NSTD, we are ready to present the Bounded Temporal Compression (BTC) algorithm. As , can be plotted as a polygonal line on - plane, it forms a Euclidean space trajectory in - space. Consequently, BTC can be transformed to Euclidean trajectory compression, which has been well-studied in the literature. Among available solutions, we adopt an algorithm similar to Before Opening Window (BOPW) [16] because of its excellent performance and the ability to address online trajectory compression issues. The only difference is that the original algorithm purely considers TSED metric, while our implementation considers TSND and NSTD metrics.

The main idea is that for a given trajectory , maximal tolerated TSND and maximal tolerated NSTD , BTC scans the tuples in sequentially. For a tuple , it attempts to skip by linking and directly. In other words, we attempt to replace the initial sub-trajectory with . To evaluate whether this replacement is valid, we calculate NSTD and TSND values between and . If and , this attempt is valid and we can safely skip , and then start the next attempt to skip by linking and directly. Otherwise, the attempt is invalid and cannot be skipped. An invalid attempt terminates the evaluation of tuple , and initiates the evaluation of the last successfully attempted tuple . The process repeats until all the edges are evaluated.

The original implementation of BOPW has a time complexity of . We improve it to with the help of a novel concept namely angular range. Given two points = and = in a - space, we assume BOPW keeps in . No matter how looks like, it must satisfy and , i.e., and . In other words, the difference between and along dimension at is bounded by and the difference between and along dimension at is bounded by . Consequently, given a vertical line segment centered at with , must intersect . As shown in Fig. 9(a), bounds an angular range that shall fall within. Similarly, given a horizontal line segment centered at with , must intersect . As shown in Fig. 9(b), actually bounds an angular range that shall fall within. Considering both and , the angular range is shrunk to the intersection between and , i.e., the shaded angular range depicted in Fig. 9(c).

Figure 9: Illustration of angular range

In order to facilitate the presentation, we assume function returns the angular range centered at point formed by all the points of set , i.e., . Take points depicted in Fig. 9(d) as an example, = , = , and = .

Input: a trajectory ;
Output: a compressed trajectory ;
Procedure:

1:  , , ;
2:  for  to  do
3:     if  then
4:         ;
5:     else
6:         ; ; ;
7:  return ;
Algorithm 3 Bounded Temporal Compression

With the help of angular range, we can compress based on BOPW with time complexity, and its pseudo code is listed in Algorithm 3. It maintains a pointer pointing to the last point along that has been enrolled into , and an angular range centered at that bounds all the possible rotations can make right after . Initially, is set to straight angle which captures the full half-plane after . Thereafter, gets shrunk by the points evaluated. For each point that is scanned by the algorithm, we check whether is located inside the current angular range centered at via the boolean function and there are two possible outputs. If is within , the points can be skipped. We further shrink the current range by and start the evaluation on . Otherwise, cannot be skipped and we append to . Meanwhile, a new angular range centered at is initiated. Take Fig. 9(e) as an example. Suppose and currently we are evaluating point . If is located at the point of , is within the angular range (i.e., the shaded area). We can shrink based on and then continue the process. If is located at the point of , it is located outside the current . Consequently, the algorithm will enroll into , set to and a new angular range to straight angle, and then continue the process.

5 Applications on Compressed Trajectory

The main purpose of trajectory compression is to use less space to store the trajectories. Consequently, whether the compressed trajectories can support various LBS applications is not the main focus of different compression approaches. As PRESS compresses the trajectories in such a way that the spatial paths are captured exactly and the temporal information loss is bounded by TSND and NSTD, we can decompress the trajectories for LBS applications. However, it is still desirable that the compressed trajectories can support certain, if not all, applications without being fully decompressed. In the following, we demonstrate in detail that the compressed trajectory can support , and , three common queries used by many LBSs, and briefly introduce some other queries PRESS can support. returns a location along the trajectory where an object is located at time , returns a time stamp when an object is located at while traveling along , and checks whether trajectory passes the region during time period to .

5.1 Query

returns a location along the trajectory where an object is located at time . Given a trajectory , its compressed form returned by PRESS, and a time slot , let ,