An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation

An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation

Daniel Lemire Corresponding author. Email:    Martin Brooks    Yuhong Yan
University of Quebec at Montreal (UQAM)
   100 Sherbrooke West    Montréal    Qc    H2X 3P2    Canada
National Research Council of Canada
   1200 Montreal Road    Ottawa    ON    Canada    K1A 0R6

Monotonicity is a simple yet significant qualitative characteristic. We consider the problem of segmenting a sequence in up to  segments. We want segments to be as monotonic as possible and to alternate signs. We propose a quality metric for this problem using the  norm, and we present an optimal linear time algorithm based on novel formalism. Moreover, given a precomputation in time consisting of a labeling of all extrema, we compute any optimal segmentation in constant time. We compare experimentally its performance to two piecewise linear segmentation heuristics (top-down and bottom-up). We show that our algorithm is faster and more accurate. Applications include pattern recognition and qualitative modeling.

ime Series, Segmentation, Monotonicity, Design of Algorithms





footnotetext: This is an expanded version of a conference paper [1].

1 Introduction

Monotonicity is one of the most natural and important qualitative properties for sequences of data points. It is easy to determine where the values are strictly going up or down, but we only want to identify significant monotonicity. For example, the drop from 2 to 1.9 in the array might not be significant and might even be noise-related. The quasi-monotonic segmentation problem is to determine where the data is approximately increasing or decreasing.

In practical applications, sequences of values can be quite large: it is not uncommon to have sensors record data at 10 kHz or more, thus generating terabytes of data and billions of data points. As a dimensionality reduction step [2], segmentation divides the data into intervals having homogeneous characteristics (flatness, constant slope [3], unimodality [4], monotonicity [5, 6], step, ramp or impulse [7], and so on). The segmentation points can also be used as markers to indicate a qualitative change in the data. Other applications include frequent pattern mining [8] and time series classification [9]. For qualitative reasoning [10], piecewise monotonic segmentation is especially important as it provides a symbolic model describing system behavior in terms of increasing and decreasing relations between variables.

There is a trade-off between the number of segments and the approximation error. Some segmentation algorithms [5] give a segmentation having no more than  segments while attempting to minimize the error ; other algorithms [6] attempt to minimize the number of segments () given an upper bound on the error . We are concerned with the first type of algorithm in this paper.

Using dynamic programming or other approaches, most segmentation problems can be solved in time . Other solutions to this problem, using machine learning to classify the pairs of data points [10], are even less favorable since they have higher complexity. However, it is common for sequence of data points to be massive and segmentation algorithms have to have complexity close to to be competitive. While approximate linear regression segmentation algorithms can be , we show that using a linear regression error to segment according to monotonicity is not an ideal solution.

We present a metric for the quasi-monotonic segmentation problem called the Optimal Monotonic Approximation Function Error (OMAFE); this metric differs from previously introduced OPMAFE metric [5] since it applies to all segmentations and not just “extremal” segmentations. We formalize the novel concept of a maximal -pair and shows that it can be used to define a unique labeling of the extrema leading to an optimal segmentation algorithm. We also present an optimal linear time algorithm to solve the quasi-monotonic segmentation problem given a segment budget together with an experimental comparison to quantify the benefits of our algorithm.

2 Monotonicity Error Metric (OMAFE)

Finding the best piecewise monotonic approximation can be viewed as a classical functional approximation problem [11], but we are concerned only with discrete sequences.

Suppose samples noted with . We define, as the restriction of over . We seek the best monotonic (increasing or decreasing) function approximating . Let (resp. ) be the set of all monotonic increasing (resp. decreasing) functions. The Optimal Monotonic Approximation Function Error (OMAFE) is where is either or .

The segmentation of a set is a sequence of intervals in with such that and for . Alternatively, we can define a segmentation from the set of points , , and . Given and a segmentation, the Optimal Monotonic Approximation Function Error (OMAFE) of the segmentation is where the monotonicity type (increasing or decreasing) of the segment is determined by the sign of . Whenever , we say the segment has no direction and the best monotonic approximation is just the flat function having value . The error is computed over each interval independently; optimal monotonic approximation functions are not required to agree at . Segmentations should alternate between increasing and decreasing, otherwise sequences such as can be segmented as two increasing segments and : we consider it is natural to aggregate segments with the same monotonicity.

We solve for the best monotonic function as follows. If we seek the best monotonic increasing function, we first define (the maximum of all previous values) and (the minimum of all values to come). If we seek the best monotonic decreasing function, we define (the maximum of all values to come) and (the minimum of all previous values). These functions, which can be computed in linear time, are all we need to solve for the best approximation function as shown by the next theorem which is a well-known result [12].


Given , a best monotonic increasing approximation function to is and a best monotonic decreasing approximation function is . The corresponding error (OMAFE) is or respectively.

The implementation of the algorithm suggested by the theorem is straight-forward. Given a segmentation, we can compute the OMAFE in time using at most two passes.

3 A Scale-Based Algorithm for Quasi-Monotonic Segmentation

We use the following proposition to prove that the segmentations we generate are optimal (see Theorem 3).{proposition} A segmentation of with alternating monotonicity has a minimal OMAFE for a number of alternating segments if

  1. [A.]

  2. or for ;

  3. in all intervals for , there exists such that .


Let the original segmentation be the intervals and consider a new segmentation with intervals . Assume that the new segmentation has lower error (as given by OMAFE). Let and .

If any segment contains a segment , then the existence of in such that and implies that and have the same monotonicity.

We show that each pair of intervals , has nonempty intersection. Suppose not, and let be the smallest index such that . Since and have the same monotonicity, for each , and have opposite monotonicity. Now consider the intervals and the points . At least one interval contains two consecutive points; choose the largest such that contains . But then , contradicting at least one of the assumptions for and .

It now follows that each pair of intervals has the same monotonicity.

Since , we can choose an index such that . We show that there exists another index such that , thus contradicting . Suppose is increasing; the proof is similar for the opposite case. Then there exist such that . From it follows that at least one of or lies in , and hence or . Thus for either or .

For simplicity, we assume has no consecutive equal values, i.e. for ; our algorithms assume all but one of consecutive equal values values have been removed. We say is a maximum if implies and if implies . Minima are defined similarly.

Our mathematical approach is based on the concept of -pair [13] (see Fig. 1): {definition} The tuple () is a -pair (or a pair of scale ) for if and for all , implies and . A -pair’s direction is increasing or decreasing according to whether or . -Pairs having opposite directions cannot overlap but they may share an end point. -Pairs of the same direction may overlap, but may not be nested. We use the term “-pair” to indicate a -pair having an unspecified . We say that a -pair is significant at scale if it is of scale for . From a topological viewpoint, a -pair is the pairing of critical points used to determine each extremum’s persistence [14].

Figure 1: A -pair.

We define -monotonicity as follows: {definition} Let be an interval, is -monotonic on if all -pairs in have the same direction; is strictly -monotonic when there exists at least one such -pair. In this case:

  • is -increasing on if contains an increasing -pair.

  • is -decreasing on if contains a decreasing -pair.

A -monotonic interval satisfies . We say that a -pair is maximal if whenever is a -pair of a larger scale in the same direction containing , then there exists a -pair of an opposite direction contained in and containing . For example, the sequence has 2 maximal -pairs: and . Maximal -pairs of opposite direction may share a common point, whereas maximal -pairs of the same direction may not. Maximal -pairs cannot overlap, meaning that it cannot be the case that exactly one end point of a maximal -pair lies strictly between the end points of another maximal -pair; either neither point lies strictly between or both do. In the case that both do, we say that the one maximal -pair properly contains the other. All -pairs must be contained in a maximal -pair.


The smallest maximal -pair containing a -pair must be of the same direction.


Suppose a -pair is immediately contained in a maximal -pair . Suppose is not in the same direction, then within , seek the largest -pair in the same direction as and containing , then it must be a maximal -pair in since maximal -pairs of different directions cannot overlap.

The first and second point of a maximal -pair are extrema and the reverse is true as well as shown by the next lemma.


Every extremum is either the first or second point of a maximal -pair. {proof} The case or follows by inspection. Otherwise, is the end point of a left and a right -pair. Each -pair must immediately belong to a maximal -pair of same direction: a -pair is contained in a maximal -pair of same direction and there is no maximal -pair of opposite direction such that . Let and be the maximal -pairs immediately containing the left and right -pair of . Suppose neither and have as a end point. Suppose , then the right -pair is not immediately contained in , a contradiction. The result follows by symmetry.

Our approach is to label each extremum in with a scale parameter saying that this extremum is “significant” at scale and below. Our intuition is that by picking extrema at scale , we should have a segmentation having error less than .


The scale labeling of an extremum is the maximum of the scales of the maximal -pairs for which it is an end point.

For example, given the sequence with 2 maximal -pairs ( and ), we would give the following labels in order .


Given , a maximal alternating sequence of -extrema is a sequence of extrema each having scale label at least , having alternating types (maximum/minimum), and such that there exists no sequence properly containing having these same properties. From we define a maximal alternating -segmentation of by segmenting at the points .


Given , let be a maximal alternating -segmentation derived from maximal alternating sequence of -extrema. Then any alternating segmentation having OMAFE() OMAFE() has at least segments.


We show that conditions A and B of Proposition 3 are satisfied with OMAFE().

First we show that each segment is -monotone; from this we conclude that . Intervals and contain no maximal -pairs of scale or larger, and therefore contain no -pairs of scale or larger. Similarly, no contains an opposite-direction significant -pair.

Condition A: Follows from -monotonicity of each and maximal -pairs not overlapping.

Condition B: We show that . If , then must begin an maximal -pair, and the maximal -pair must end with since maximal -pairs cannot overlap. The case is similar. Otherwise, since maximal -pairs cannot overlap, each is either a maximal -pair of scale or larger or there exist indices and , and such that is a maximal -pair of scale at least , and is a maximal -pair of scale at least . These two maximal -pairs have the same direction, and that this is opposite to the direction of . Now suppose . Then is a -pair properly containing , and , . But neither nor can be properly contained in a -pair of opposite direction lying within , thus contradicting their maximality and proving the claim.

Sequences of extrema labeled at least are generally not maximal alternating. For example the sequence is scale labeled . However, a simple relabeling of certain extrema can make them maximal alternating. Consider two same-sense extrema such that lying between them there exists no extremum having scale at least as large as the minimum of the two extrema’s scales. We must have , since otherwise the point upon which has the lesser value could not be the endpoint of a maximal -pair. This is the only situation which causes choice when constructing a maximal alternating sequence of -extrema. To eliminate this choice, replace the scale label on with the largest scale of the opposite-sense extrema lying between them. In the next section, Algorithm 1 incorporates this re-labeling making Algorithm 2 simple and efficient.

3.1 Computing a Scale Labeling Efficiently

Algorithm 1 (next page) produces a scale labeling in linear time. Extrema from the original data are visited in order, and they alternate (maxima/minima) since we only pick one of the values when there are repeated values (such as ).

The algorithm has a main loop (lines 5 to 12) where it labels extrema as it identifies extremal -pairs, and stack the extrema it cannot immediately label. At all times, the stack (line 3) contains minima and maxima in strictly increasing and decreasing order respectively. Also at all times, the last two extrema at the bottom of the stack are the absolute maximum and absolute minimum (found so far). Observe that we can only label an extrema as we find new extremal -pairs (lines 7, 10, and 14).

  • If the stack is empty or contains only one extremum, we simply add the new extremum (line 12).

  • If there are only 2 extrema in the stack and we found either a new absolute maximum or new absolute minimum (), we can pop and label the oldest one () (lines 9, 10, and 11) because the old pair () forms a maximal -pair and thus must be bounded by extrema having at least the same scale while the oldest value () does not belong to a larger maximal -pair. Otherwise, if there are only 2 extrema in the stack and the new extrema satisfies , then we add it to the stack since no labeling is possible yet.

  • While the stack contains more than 2 extrema (lines 6, 7 and 8), we consider the last three points on the stack () where is the last point added. Let be the value of the new extrema. If , then it is simply added to the stack since we cannot yet label any of these points; we exit the while loop. Otherwise, we have a new maximum (resp. minimum) exceeding (resp. lower) or matching the previous one on stack, and hence is a maximal -pair. If , then is a maximal -pair and thus, cannot be the end of a maximal -pair and cannot be the beginning of one, hence both and are labeled. If then we have successive maxima or minima and the same labeling as applies.

During the “unstacking” (lines 13 and following), we visit a sequence of minima and maxima forming increasingly larger maximal -pairs.

The algorithm runs in time (independent of ). Indeed, for any index of an extremum, the condition at line 6 will evaluate once to false; moreover the condition at line 6 cannot evaluate to true more than times.

1:  INPUT: an array containing the values indexed from to , repeated consecutive values have been removed
2:  OUTPUT: a scale labeling for all extrema
3:   empty stack, First() is the value on top, Second() is the second value
4:  define
5:  for  index of an extremum in , ’s are visited in increasing order do
6:     while length() and ( is a minimum such that or is a maximum such that do
7:        label First() and Second() with
8:        pop stack twice
9:     end while
10:     if length() is 2 and ( is a minimum such that or is a maximum such that then
11:        label Second() with
12:        remove Second() from stack
13:     end if
14:     stack to
15:  end for
16:  while length of  do
17:     label First() with
18:     pop stack
19:  end while
20:  label First() and Second() with
Algorithm 1 Algorithm to compute the scale labeling in time.

Once the labeling is complete, we find extrema having largest scale in time using memory, then we remove all extrema having the same scale as the smallest scale in these extrema (removing at least one), we replace the first and the last extrema by and respectively (see Algorithm 2). The result is an optimal segmentation having at most segments.

  INPUT: an array containing the values indexed from to
  INPUT: a bound on the number of segments desired
  OUTPUT: unsorted segmentation points (a -segmentation)
   empty array (capacity )
  for  is index of an extremum in having scale , are visited in increasing order do
     insert in so that is sorted by scale in decreasing order (sort on ) using binary search
     if length of is  then
        pop last(L)
     end if
  end for
  remove all elements of having the scale of last(L)
  RETURN: the indexes in replacing first one by and last one by
Algorithm 2 Given the scale labeling, this algorithm will return a segmentation using at most segments. It is assumed that there are at least extrema to begin with.

Alternatively, if we plan to resegment the time series several times with different values of , we can sort all extrema by their label in time , and compute in time an auxiliary structure on the sorted set so that when selecting the  item in the sorted list (), we obtain the index of the earliest occurrence of this scale in the list ( and if ) in constant time. Hence, we can segment any time series optimally in constant time given this precomputation in time .


Given a precomputation in time using  storage, for any desired upper bound on the number of segments , we can compute the segmentation points of an optimal OMAFE, and the corresponding OMAFE value, in constant time.

Hence, we can compute an OMAFE versus plot in time.

  INPUT: Time Series of length
  INPUT: Desired number of segments
  INPUT: Function computing linear fit error in range
  while   do
     find tuple in with maximum last entry
     find minimum of for
     remove tuple from
     insert tuples and in
  end while
   contains the segmentation
Algorithm 3 Piecewise Linear Top-Down Segmentation Heuristic.
  INPUT: Time Series of length
  INPUT: Desired number of segments
  INPUT: Function computing linear fit error in range
  while   do
     find consecutive intervals in , and , having minimal value
     merge the two consecutive intervals
  end while
   contains the segmentation
Algorithm 4 Piecewise Linear Bottom-Up Segmentation Heuristic.

4 Experimental Results and Comparison to Piecewise Linear Segmentation Heuristics

We compare our optimal algorithm with our implementations of two piecewise linear segmentation heuristics [3]: top-down, which runs in time (see Algorithm 3), and bottom-up which runs in  time (see Algorithm 4). The top-down heuristic successively segments the data starting with only one segment, each time picking the segment with the worse linear regression error and finding the best segmentation point; the linear regression is not continuous from one segment to the other. The regression error can be computed in constant time if one has precomputed the range moments [15, 16]. The bottom-up heuristic starts with intervals containing only one data point and successively merge them, each time choosing the least expensive merge. By maintaining the segments in a doubly-linked list coupled with a heap or tree, it is possible to obtain a bottom-up heuristic with  complexity, but it then uses much more memory and it is more difficult to implement.

Once the piecewise linear segmentation is completed, we run through the segments and aggregate consecutive segments having the same sign where the sign of a segment is defined by , setting 0 to be a positive sign (increasing monotonicity).

We implemented all algorithms in Python (version 2.5) and ran the experiments on a 2.16 GHz Intel Core 2 Duo processor with sufficient RAM (1 GB). Fig. 2 presents the relative speed of the various segmentation algorithms on time series of various lengths for a fixed number of segments (using randomly generated data). The timings reported include all pre-processing.

Figure 2: Time to segment a time series of length in segments.

4.1 Electrocardiograms (ECG)

ECGs have a well known monotonicity structure with 5 commonly identifiable extrema per pulse (reference points P, Q, R, S, and T) (see Fig. 3) though not all points can be easily identified on all pulses and the exact morphology can vary. We used freely available samples from the MIT-BIH Arrhythmia Database [17]. We only present our results over one sample (labeled “100.dat”) since we found that results did not vary much between data samples. These ECG recordings used a sampling rate of 360 Hz per channel with 11-bit resolution (see Fig. 4(a)). We keep the first 4000 samples (11 seconds) and about 14 pulses, and we do no preprocessing such as baseline correction. We can estimate that a typical pulse has about 5 “easily” identifiable monotonic segments. Hence, out of 14 pulses, we can estimate that there are about 70 significant monotonic segments, some of which match the domain-specific markers (reference points P, Q, R, S, and T). A qualitative description of such data is useful for pattern matching applications.

Figure 3: Schema of an ECG pulse with commonly identified reference points (PQRST).

The running time as a function of is presented in Fig. 4(b). The scale-based segmentation implementation is faster than our implementations of the piecewise linear heuristics. On such a long time series (4000 samples), our implementation of the bottom-up heuristic is much slower than the alternatives.

(a) Time Series
(b) Time vs. number of segments
(c) OMAFE vs. number of segments
Figure 4: Results of experiments over ECG data.

We want to determine how well the piecewise linear segmentation heuristics do comparatively. OMAFE is an absolute and not relative error measure, but because the range of the ECGs under consideration is roughly between 950 and 1150, we expect the OMAFE to never exceed 100 by much. The OMAFE with respect to the maximal number of segments () is given in Fig. 4(c): it is a “monotonicity spectrum.” By counting on about 5 monotonic segments per pulse with a total of 14 pulses, there should about 70 monotonic segments in the 4000 samples under consideration. We see that the decrease in OMAFE with the addition of new segments starts to level off between 50 and 70 segments as predicted. The addition of new segments past 70 () has little impact. The scale-based algorithm is optimal, but also at least 3 times more accurate than the top-down algorithm for larger and this is consistent over other data sets. In fact, the OMAFE becomes practically zero for whereas the OMAFE of the top-down linear regression algorithm remains at around 20, which is still significant. The bottom-up heuristic is more accurate than the top-down heuristic, but it still has about twice the OMAFE for large . OMAFE of the scale-based algorithm is a non increasing function of , a consequence of optimality.

4.2 Temperature Recordings

We consider the daily temperature recordings of the first of 35 weather stations in the MD*Base Daily temperature data set [18]111the data is attributed to Ramsay and Silverman [19]. Since we only have one year of recordings, only 365 data points are used (see Fig. 5(a)). We also give the running times (see Fig. 5(b)) and the accuracy (see Fig. 5(c)). Our implementation of the bottom-up heuristic is now much faster due to small size of the times series, but the OMAFE, while superior to the top-down heuristic, exhibits a spurious spike near , showing the danger of relying on a piecewise linear heuristic to study the monotonicity of a data set. Considering the OMAFE of our scale-based algorithm, we notice that the accuracy increases slowly after .

(a) Time Series
(b) Time vs. number of segments
(c) OMAFE vs. number of segments
Figure 5: Results of experiments over daily temperature data.

4.3 Synthetic Random Walk Data

Random walks are often used as models for common time series such as stock prices. We generated a random walk using the formula where (see Fig. 6(a)). The running times are nearly identical to the ECG case, as is expected since the time series have the same length. However, the OMAFE differs (see Fig. 6(c)): using our optimal algorithm, the curve is smooth with no sharp drop. Meanwhile, the bottom-up heuristic exhibits another spurious spike in the OMAFE (around ) while it provides the optimal segmentation at .

(a) Time Series
(b) Time vs. number of segments
(c) OMAFE vs. number of segments
Figure 6: Results of experiments over random walk.

5 Conclusion and Future Work

We presented optimal and fast algorithms to compute the best piecewise monotonic segmentation in time and the complete OMAFE-versus- spectrum in time . Our experimental results suggest that one should be careful when deriving monotonicity information from piecewise linear segmentation heuristics. Future work will focus on choosing the optimal number of segments for given applications. We also plan to investigate the applications of the monotonicity spectrum as a robust analysis. Further work to integrate flat segments is needed [5, 16].


  • [1] D. Lemire, M. Brooks, and Y. Yan, “An optimal linear time algorithm for quasi-monotonic segmentation,” in ICDM 2005, 2005.
  • [2] E. Bingham, A. Gionis, N. Haiminen, H. Hiisilä, H. Mannila, and E. Terzi, “Segmentation and dimensionality reduction,” in SDM 2006, 2006.
  • [3] E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani, “An online algorithm for segmenting time series,” in ICDM 2001, pp. 289–296, 2001.
  • [4] N. Haiminen and A. Gionis, “Unimodal segmentation of sequences,” in ICDM 2004, 2004.
  • [5] M. Brooks, Y. Yan, and D. Lemire, “Scale-based monotonicity analysis in qualitative modelling with flat segments,” in IJCAI 2005, 2005.
  • [6] W. Fitzgerald, D. Lemire, and M. Brooks, “Quasi-monotonic segmentation of state variable behavior for reactive control,” in AAAI 2005, 2005.
  • [7] D. G. Galati and M. A. Simaan, “Automatic decomposition of time series into step, ramp, and impulse primitives,” Pattern Recognition, vol. 39, pp. 2166–2174, November 2006.
  • [8] J. Han, W. Gong, and Y. Yin, “Mining segment-wise periodic patterns in time-related databases,” in KDD 1998, 1998.
  • [9] E. J. Keogh and M. J. Pazzani, “An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback,” in KDD 1998, pp. 239–243, 1998.
  • [10] D. Šuc and I. Bratko, “Induction of qualitative tree,” in ECML 2001, pp. 442–453, Springer, 2001.
  • [11] V. A. Ubhaya, S. E. Weinstein, and Y. Xu, “Best piecewise monotone uniform approximation,” Approx. Theory, vol. 63, pp. 375–383, December 1990.
  • [12] V. A. Ubhaya, “Isotone optimization I,” Approx. Theory, vol. 12, pp. 146–159, 1974.
  • [13] M. Brooks, “Approximation complexity for piecewise monotone functions and real data,” Computers and Mathematics with Applications, vol. 27, no. 8, 1994.
  • [14] H. Edelsbrunner, D. Letscher, and A. Zomorodian, “Topological persistence and simplification,” Discrete Comp. Geo., vol. 28, pp. 511–533, 2002.
  • [15] D. Lemire, “Wavelet-based relative prefix sum methods for range sum queries in data cubes,” in CASCON 2002, IBM, 2002.
  • [16] D. Lemire, “A better alternative to piecewise linear time series segmentation,” in SDM 2007, 2007.
  • [17] A. L. Goldberger et al., “PhysioBank, PhysioToolkit, and PhysioNet,” Circulation, vol. 101, no. 23, pp. 215–220, 2000. – last checked in April 2007.
  • [18] Institute for Statistics and Econometrics, “MD*Base Online,” 2007. – last checked in April 2007.
  • [19] J. O. Ramsay and B. W. Silverman, The Analysis of Functional Data. Springer, 1997.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description