Classification and clustering for samples of event time data using non-homogeneous Poisson process models
Data of the form of event times arise in various applications. A simple model for such data is a non-homogeneous Poisson process (NHPP) which is specified by a rate function that depends on time. We consider the problem of having access to multiple independent samples of event time data, observed on a common interval, from which we wish to classify or cluster the samples according to their rate functions. Each rate function is unknown but assumed to belong to a finite number of rate functions each defining a distinct class. We model the rate functions using a spline basis expansion, the coefficients of which need to be estimated from data. The classification approach consists of using training data for which the class membership is known, to calculate maximum likelihood estimates of the coefficients for each group, then assigning test samples to a class by a maximum likelihood criterion. For clustering, by analogy to the Gaussian mixture model approach for Euclidean data, we consider a mixture of NHPP models and use the expectation-maximisation algorithm to estimate the coefficients of the rate functions for the component models and cluster membership probabilities for each sample. The classification and clustering approaches perform well on both synthetic and real-world data sets. Code associated with this paper is available at https://github.com/duncan-barrack/NHPP.
Keywords: point process; non-homogeneous Poisson process; classification; clustering; expectation-maximisation.
Much real world data is amenable to modelling as a Poisson process (Ross et al., 1996), for example the times at which a customer makes purchases in a shop (Schmittlein et al., 1987), times of requests to internet web servers (Arlitt and Williamson, 1997) and the firing times of neurons (Brown et al., 2002). In a wide range of applications one observes multiple samples of such data on a common interval and a challenge is to label each sample as belonging to one of a small number of distinct classes. A particular example we consider later is the categorisation of stores by till transaction times. This can provide retailers valuable insights in to the different demands placed on different store types and help to inform various decisions from stock management to appropriate store staffing levels. Another example we consider is the categorisation of transportation hubs according to the times at which they are used. Such information helps inform the type and number of services (transportation, catering, retail and others) that should be made available at each type of hub. When a ‘training set’, which contains samples with known class memberships, is available such a task is termed ‘classification’, when no labels are available for the data it is termed a ‘clustering’ problem (Hastie et al., 2009). Previous approaches to classification and clustering of temporal data of this type have focused on deriving summary statistics from the data such as the mean and variance of the inter-event times and using these as inputs in statistical learning models (Moore and Zuev, 2005; Roughan et al., 2004; McGregor et al., 2004). In addition to studies of temporal data a number of authors have considered spatial point processes. Illian et al used functional principal component analysis on second order summary statistics of spatial point process data to classify plant species (Illian et al., 2006, 2004). Closer in approach to the present work Linnett et al. (1995) used a maximum likelihood (ML) criterion to classify the textures of images modelled using homogeneous Poisson processes (HPPs).
HPPs, like those used in the work of Linnett et al. (1995) assume rates are constant for each class which is too restrictive for many applications. For example stock transaction times and human communication behaviour both exhibit periodic patterns (Engle, 2000; Malmgren et al., 2008). Computer network traffic is better modelled by non-homogeneous Poisson process (NHPP) models rather than HPPs (Paxson and Floyd, 1995). Hence, in this work, we consider instead NHPP models, which are defined in terms of rate functions that vary with time. We model these rate functions using a spline basis expansion which imposes an assumption of smoothness on the functions and means estimating them amounts to estimating a finite number of basis coefficients. We go beyond previous work which has focused purely on rate function estimation (Alizadeh et al., 2008; Kuhl and Bhairgond, 2000; Scott et al., 1980; Zhao and Xie, 1996; Massey et al., 1996; Lee et al., 1991) by the considering the tasks of classification and clustering. For classification we show that once intensity estimates for the NHPP rate functions corresponding to each class are obtained by solving a convex optmisation problem, it is a simple task to obtain the posterior class membership probabilities for test data. For clustering, where class labels are not known a priori, we assume all data to have been generated from a mixture of NHPP models. Again the rate functions for each model are represented by a combination of basis functions but his time the expectation-maximisation (EM) algorithm is used to obtain the coefficients of the basis functions as well as membership probabilities for each distinct sample of event times. Our approaches are validated on synthetic and three real data sets.
The paper is structured as follows. In Section 2 we describe our procedure for estimating NHPP rate functions. Then in Sections 3 and 4 we outline our classification and clustering methods. The results of applying these to synthetic and real data are shown in Sections 5. We conclude with a discussion in Section 6.
2 Rate function estimation
Suppose that we assume that a sample of event times , observed on the interval , arises from a NHPP with rate function , then the log likelihood for the rate function is
where (see for example Thompson (2012)).
In estimating in equation (1) we impose that it belongs to a class of smooth functions by assuming it is a linear combination of smooth basis functions
There are many choices of basis functions , but in this paper we use B-splines (De Boor, 1978) which offer control over smoothness. B-splines also have finite support and are thus a natural choice to model NHPP rate functions defined over a finite interval. We estimate the basis function coefficients by minimising i.e. by minimising
subject to the constraint that is less than or equal to zero, i.e.
This inequality ensures that the rate function estimate which is consistent with the requirement that NHPP rate functions are non-zero for all values of time.
As we show, this is a convex optimisation problem. Furthermore, as and are twice differentiable, the minimisation to perform is straightforward using anyone of several techniques designed for solving non-linear convex optimisation problems where the objective function and inequality constraint are both twice differentiable. For the results in this paper we used the interior point method (Byrd et al., 1999, 2000; Waltz et al., 2006). We prove that (3) is convex in by first deriving the gradient of
where . The Hessian is
where as for every and .
As we may express , where
is positive semi-definitive as for every non-zero column vector of real numbers. Therefore (equation (3)) is convex in .
As (equation (4)) is linear in it is both convex and concave in . Hence estimating the basis function coefficients of the NHPP rate function is a convex optimisation problem.
Consider a set of training samples in which is a set of of event times for the sample and is its corresponding class label. The classification task consists of using this training data to predict the unobserved class label for a new sample with event times . Assuming each training sample is independent, the log likelihood estimate for the rate function for class is
where the indicator function if and 0 otherwise. The rate function is given by
Estimates for the basis function coefficients of the expression are found via the maximum likelihood estimation outlined in Section 2.
Classification of test data
The posterior probability that a test observation is a member of class can be derived from Bayes’ theorem
If observations are to be assigned to the class for which this probability is maximal, i.e. , then this corresponds to the Bayes classifier which is optimal under 0-1 loss (Hastie et al., 2009).
For the unsupervised learning task of cluster analysis of event time data we assume the data to have been generated from a mixture of NHPP models. Furthermore, although class memberships are unknown, each sample is associated with a hidden latent variable which specifies which model, or ‘cluster’, it is from. Specifically, we assume event time samples are generated from a mixture of NHPP models and let be a set of latent random variables which determine the component from which each sample originates. For the mixture the complete data log likelihood function is
and corresponding incomplete data log likelihood is
Here is the ‘mixing weight’ of component , where and . is the likelihood for the rate function associated with NHPP model given the sample . As before, we represent the rate functions associated with each component as a linear combination of basis functions, i.e. .
We use the EM algorithm (see Dempster et al. (1977) for details) to find estimates for the parameters (where , and is the set of basis function coefficients for NHPP rate function ) which maximise the log likelihood of the mixture. This procedure is described below.
Given a current estimate for the model parameters we find the so called ‘auxiliary function’ which is the expected value of the data log likelihood equation (11) of the mixture model with respect to the conditional distribution of given the data .
Here is the ‘membership probability’ of sample to model, or ‘cluster’, . is the log likelihood for rate function for sample . Plugging in the basis function expansions for the rate functions in to equation (17) we get
To obtain the membership probabilities for of each sample of model we regard the mixing weights as prior probabilities of each mixture component and use Baye’s theorem to obtain
After obtaining values for the membership probabilities, we obtain subsequent estimates for the parameters by finding the which maximise the auxiliary function (18) subject to the constraint that the mixing weights must sum to 1, i.e. . To address this constraint we consider the Lagrange function
where is the Lagrange multiplier. Updates for the mixing weights are found by taking the partial derivative of (20) with respect to , setting it to zero and solving for . The partial derivative of is
Evaluating (21) and removing all constant terms we get
Setting equation (23) to 0 and solving for we get
Using the constraint that , . Hence the next estimate for the mixing weight .
Next we seek the which maximise subject to the constraints
for all .
This is a convex minimisation problem of the same variety as encountered in Section 2. To see this, consider the term from equation (20). is identical to equation (3) which we showed is convex in the set of basis function coefficients in Section 2. As for every and , is also convex. Furthermore, disregarding terms which do not involve is convex in as sums of convex functions are also convex. The functions in the constraints given in (25) are linear in and are therefore convex and concave. Therefore to find the which maximise (20) we can use the same numerical approach as used to minimise (3). The expectation and maximisation steps are applied iteratively until the the change in value of the Lagrangian function from one step to the next is less than a prescribed threshold. We used a value of 1e-4 for the results in this paper.
Depending on the starting conditions the EM algorithm may converge to a local maximum. To overcome this issue we use a form of the random restart approach commonly used with the EM algorithm for Gaussian mixtures (Biernacki et al., 2003). For our procedure, to set initial estimates for , each sample is randomly assigned a class label . Rate function estimates for the ‘random’ classes are obtained using the classification procedure outlined in Section 3 and the basis function estimates which correspond to these are used for the initial estimates for . All mixing weights in are initially set to 1/. The EM algorithm is then initialised from a number of different sets of initial conditions corresponding to the different ‘random’ class assignments and the solution with the largest log likelihood is retained. With this procedure we found that usually, at most, 3 random restarts were required to avoid local maxima for the real world data sets.
5 Empirical results
In this section we demonstrate the efficacy of our methods with a synthetic and three real world data sets.
5.1 Synthetic data
We first applied our methods to synthetically generated event data with prescribed rate functions. None of the data generating rate functions belong to the class of smooth functions we assumed in our methods. We generated three difference data sets based on the following four rate functions.
For synthetic data set 1 we generated event times for samples with class label 1 (resp. 2) separately using rate (). As and are both periodic functions this experiment mimics real world examples of point processes which exhibit periodic patterns (e.g. time of stock trades (Engle, 2000) and the times at which emails are sent (Malmgren et al., 2008)).
For synthetic data set 2, (resp. ) was used to generate class 1 (2) samples. These rate functions mimic computer network traffic for which a NHPP model with rate function defined by a step function is a suitable model (Paxson and Floyd, 1995).
Finally for data set 3 we considered a four class problem where rate functions and were used to generate data in classes 1,2,3 and 4 respectively. The thinning algorithm (Lewis and Shedler, 1979) was used to generate all data. Event times for 20 synthetically generated samples for each each rate function are shown in Figure 1.
For each class, 20 samples were generated. 10 of these formed the training set and the remainder formed the out of sample test set. Using the classification procedure outlined in Section 3 we were able to recover the prescribed rate function for each data set (see Figure 2). Furthermore, when each sample in the test data was assigned to the class for which its membership probability (as defined in equation (10)) is maximal, the classification accuracy was 100%.
Again, 20 samples for each class were generated. Using the clustering method detailed in Section 4 we were able to recover the NHPP rate functions for all data sets (see Figure 3). For each data set, we were able to correctly assign each sample to the NHPP model from which it was generated by assigning it to the model for which the membership probability (equation (19)) was maximal.
5.2 Retail transaction data results
Next we introduce our first real data set which is made up of the till transaction times from the 6 of September 2011 to the 28 of February 2015 for 74 UK stores (10,000 transactions per store) belonging to a large UK retailer. Half of these stores have been categorised by the retailer as ‘High street’ (HS) which are large stores found in city centres and the remainder as ‘travel’ stores which are situated in airports and railway stations. These categorisations serve as the ground truth for our analysis.
We used repeated 5 fold cross validation (CV) (Efron and Gong, 1983) to obtain estimates for the classification accuracy, high street store and travel store true positive rates. Specifically, samples were randomly partitioned in to 5 folds, each containing 14 or 15 samples. We ensured that the proportions of the two classes in each fold were equal, or for folds with an odd number of samples, as equal as possible (a procedure known as stratified CV). Data in a single fold was retained as the test set and the remainder made up the training set which was used to obtain the NHPP rate function estimates for the high street store and travel store classes. Samples in the test set fold were then assigned to the class for which their posterior membership probability was maximal. Classification and true positive rates were calculated using the predicted class labels and the ground truth class labels. This procedure was repeated 5 times with the samples in each of the folds being used once as the test set. Finally, the entire process was repeated 100 times with different random partitions to give a total of 500 numerical experiments. The average classification accuracy, high street store true positive and travel store true positive rates were 0.847, 0.994 and 0.701 respectively. The estimates for the rate functions of each class for one CV run are shown in Figure 4. Both rate functions are periodic with the high street rate function peaking around Christmas which corresponds to the busiest shopping period on the high street. In contrast, the rate function for the travel stores (which are located in airports and railway stations) peaks around summer which coincides with the peak holiday period when holiday makers are travelling via transportation hubs. For our results, the high street store true positive rate was higher than the travel store true positive rate suggesting the classifier has more of a tendency to misclassify travel stores rather than high street stores. Further investigation reveals that the misclassified travel stores are situated in central railways stations in large UK cities including London, Glasgow, Leeds and Liverpool. A possible explanation for the misclassification of these stores is that, although classed as travel stores, the store location means that these stores predominantly serve high street shoppers and thus have a demand profile which is similar to high street stores.
Setting the number of models in the mixture to 2, and assigning each sample to the model for which its membership probability is maximal, the average clustering accuracy, high street store true positive and travel store true positive rates were 0.764, 1 and 0.528 respectively. The high street store true positive rate (resp. travel store true positive rate) of the clustering results was higher (lower) than the corresponding classification results. The reason for this is that the clustering procedure allows stores to naturally cluster into groups with similar temporal transaction patters whilst for classification group membership is prescribed a priori in the training set. Further investigation reveals that most of the stores in the ‘travel’ cluster are located in airports. Stores categorised by the retailer as travel stores, but with temporal transaction characteristics more similar to high street stores (such as the stores situated in central railway stations in large UK cities mentioned in the previous section) cluster into the ‘high street’ group. The rate functions of the two NHPP models are shown in Figure 5 and clearly the blue rate function peaks around summer and corresponds to travel stores,while the red rate function peaks around Christmas and corresponds to high street stores.
Rate function estimates obtained by setting are shown in Figure 6. Two clusters, with rate functions labelled ‘airport 1’ and ‘airport 2’ in the plot are comprised almost exclusively of airport stores and have very similar rate functions peaking around the summer. A third cluster, labelled ‘high street and railway’ is made up mainly of high street stores and railway stores located in stations in city centres and its rate function peaks around Christmas. The fact that setting does not lead to more meaningful store partitions, compared to setting , supports the notion that the stores fall naturally in to one of two clusters, the first made up predominantly of airport stores and the second of stores located centrally in cities either on the high street or in city centre railway stations.
5.3 Hubway bike share data set
This data set contains bike trip information for the Hubway bike share program located in Boston, Massachusetts and its environs. The data, which covers the period from the 28 of July 2011 to the 30 of November 2013, contains trip duration time, date and time of the trip as well the the id of the start and finishing bike share stations of cycle journeys. Additionally the user type (casual - ‘24-hour or 72-hour pass user’ or member - ‘annual or monthly member’) was also recorded. In our analysis we ranked each station according to the proportion of station users classed as either casual or members with the goal of investigating whether our methods could partition stations with high proportions of casual users from those with a high proportion of member users based on the time of the day at which trips at each station commenced. In our analysis we only considered stations with more than 5,000 recorded trips to avoid any spurious results caused by stations with only a handful of recorded trips. This left a total of 89 station samples.
After ranking each station according to the proportion of casual/member users we classified the stations with the highest casual-member proportions as ‘casual’ stations and the stations with the lowest casual-member proportions as ‘member’ stations. We then used the same CV procedure as before to obtain average classification accuracy, casual station and members station true positive rates. Raster plots showing the event times for both station types and the average performance of our classification procedure over all test folds as a function of are shown in Figure 7. The results show that our classification procedure did a very good job of classifying the station samples (the average classification accuracy for is 0.775 for example). We also include a plot showing the estimated rate functions for the casual and member station classes for in Figure 7. The member stations rate function (red curve) has two distinct peaks around 8am and 5pm. These peaks are far less pronounced for the casual stations (blue curve) which has a larger intensity in the middle of the day. The most plausible explanation for these features is that member stations are predominantly used by commuters who cycle to and from work and, as they use the bike scheme every day, are more likely to sign up for membership. Whilst casual stations are more likely to be used by tourists who have the free time to cycle during the middle of the day but, as they use the bike share scheme infrequently, are less likely to become members.
Although the classification results are very good, our classifier had a tendency to misclassify member stations more than casual stations. In Figure 7 is shown the estimated rate functions for the two ‘member stations’ which were misclassified the most number of times during the CV experiments. Rather than two distinct peaks around 8am and 5pm, station 26 has a single peak around 8am and station 64 a single peak around 5pm. This suggests that there may be two distinct sub-types of member stations; one type used mostly in the mornings (most likely located in the suburbs close to the homes of commuters) and another close to the commuters places of work (most likely in the city centre) which are mostly used after 5pm when the commuters return home from work. This motivated us to consider a three class problem consisting of ‘casual stations’, ‘morning member stations’ (member stations where the majority of trips took place before 5pm) and ‘afternoon member stations’ (member stations where most trips took place after 5pm). The results of this experiment are shown in Figure 8 where it can be seen by comparing these results to the results in Figure 7 that the average classification accuracy for the 3 class problem is higher than for the 2 class problem (for , the 3 class problem average classification accuracy is 0.829, whilst it is 0.775 for the 2 class problem for example). Furthermore, the rate function estimates for the 3 class problem, which are shown in Figure 8, also capture the different usage patterns of the 3 different station types. There is a clear peak in activity for the morning (resp. afternoon) member station class around 8am (5pm), whilst the estimated rate function for the casual station type has more of a uniform profile.
The clustering results for the 2 and 3 class problems are shown in Figure 9 and show that the stations classes defined in the previous subsection emerge naturally from the data. For , the estimated rate function for one station type has two distinct peaks at 8am and 5pm. This is the ‘member station’ class. The other estimated rate function has a greater intensity during the middle of the day. This is the ‘casual station’ type, although for the clustering results, it has a noticeably higher peak around 5pm when compared to the rate function of the casual station class obtained via the classification procedure (c.f. the estimated rate functions in Figure 7). For , the casual station class persists and the two morning and afternoon member station sub classes, which were defined in the previous subsection, emerge naturally from the data. Assigning each sample to the class for which its membership probability is maximal, the clustering accuracy and true positive rates were very good (for , the clustering accuracy is 0.750 for and 0.889 for ) and broadly reflect the results seen for the classification task. For , the casual station true positive rate exceeds that of the member station true positive rate. However, setting goes some way to remedying this and also results in a higher overall clustering accuracy, as it did or the classification results, indicating a choice of 3 clusters is a more appropriate choice for this problem.
5.4 A&E data set
This data set contains the time of day (00:00:00 to 23:59:59) in every month from April 2011 to the end of December 2014 at which patients arrived at Accident and Emergency (A&E) departments in hospitals throughout England. Each patient was diagnosed as either suffering from poisoning (including alcohol poisoning) or a cardiac condition and these are used as the ground truths for the class labels. For each label there are 45 samples corresponding to each of the 45 months spanning April 2011 to the end of December 2014. Each monthly sample contains 10,000 A&E admission events.
Using the same CV procedure as outlined in Section 5.2.1, we obtained an average classification and true positive rates for both classes of 1. The estimated rate functions are shown in Figure 10 from which it can be seen than admissions for poisoning peak around midnight. This is most likely due to instances of alcohol poisoning amongst revellers in the late evenings. Cardiac conditions peak in the morning, which is consistent with evidence in the medical literature.(Peters et al., 1989; Muller et al., 1989)
Estimated rate functions are shown in Figure 11 and these are very similar to the estimates obtained from the classification procedure. In particular, instances of poisoning peak around night time and cardiac conditions reach a high point in the morning. Assigning poisoning and cardiac months to the model for which their membership probabilities are maximal gives a clustering accuracy of 100%.
We have made Matlab code for the classification and clustering procedures outlined in Sections 3-4 and for the synthetic data set results in Section 5.1 available at https://github.com/duncan-barrack/NHPP. The Hubway bike share data set is available at https://www.thehubway.com/system-data. The store data and NHS A&E data are not publicly available.
In this paper we have detailed principled approaches for the classification and clustering of event data using NHPP models. Results on synthetic and real data were presented which show the effectiveness of our methods. The focus of this work has been on temporal point process data observed over a fixed interval . Another possibility, particularly suited for periodic data, would be to assume the domain of the data is periodic (e.g. a circle). The approach we have described here should generalise easily to the circular case. Furthermore, our method is not restricted to temporal data and, in principle, can be generalised to multi-dimensional spatial and spatial-temporal point process data.
For the clustering procedure we did not consider the problem of choosing an appropriate number of models in the NHPP mixture. In principle, the Bayesian information criterion (Schwarz et al., 1978) or Akaike information criterion (Akaike, 1974) could be used for this. Future work could investigate the suitability of these methods for choosing the number of NHPP components for our clustering procedure.
We would like to thank The UK based retailer and NHS England for their support and for providing the data used in our analysis. This work was funded by EPSRC grant EP/G065802/1 - Horizon: Digital Economy Hub at the University of Nottingham and EPSRC grant EP/L021080/1 - Neo-demographics: Opening Developing World Markets by Using Personal Data and Collaboration.
- Corresponding author at: Horizon Digital Economy Research Institute, University of Nottingham, Nottingham, NG7 2TU, UK. Tel.: +44 (0)115 8232554. E-mail address: firstname.lastname@example.org.
- H. Akaike. A new look at the statistical model identification. IEEE Trans. Autom. Control, 19(6):716–723, 1974.
- F. Alizadeh, J. Eckstein, N. Noyan, and G. Rudolf. Arrival rate approximation by nonnegative cubic splines. Oper Res, 56(1):140–156, 2008.
- M.F. Arlitt and C.L. Williamson. Internet web servers: Workload characterization and performance implications. IEEE/ACM Trans. Netw, 5(5):631–645, 1997.
- C. Biernacki, G. Celeux, and G. Govaert. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal., 41(3):561–575, 2003.
- E.N. Brown, R. Barbieri, V. Ventura, R.E. Kass, and L.M. Frank. The time-rescaling theorem and its application to neural spike train data analysis. Neural Comput., 14(2):325–346, 2002.
- R.H. Byrd, M.E. Hribar, and J. Nocedal. An interior point algorithm for large-scale nonlinear programming. SIAM J. Optim, 9(4):877–900, 1999.
- R.H. Byrd, J.C. Gilbert, and J. Nocedal. A trust region method based on interior point techniques for nonlinear programming. Math. Prog., 89(1):149–185, 2000.
- C. De Boor. A practical guide to splines, volume 27. Springer-Verlag New York, 1978.
- A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B Stat Methodol, pages 1–38, 1977.
- B. Efron and G. Gong. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat., 37(1):36–48, 1983.
- R.F. Engle. The econometrics of ultra-high-frequency data. Econometrica, 68(1):1–22, 2000.
- T.J. Hastie, R.J. Tibshirani, and J.H. Friedman. The elements of statistical learning : data mining, inference, and prediction. Springer series in statistics. Springer, New York, 2009.
- J.B. Illian, E. Benson, J. Crawford, and H.J. Staines. Multivariate methods for spatial point process–a simulation study. pages 125–130. 2004.
- J.B. Illian, E. Benson, J. Crawford, and H. Staines. Principal component analysis for spatial point processes—assessing the appropriateness of the approach in an ecological context. In Lecture Notes in Statistics 185. Case studies in spatial point process modeling, pages 135–150. Springer, 2006.
- M.E. Kuhl and P.S. Bhairgond. New frontiers in input modeling: nonparametric estimation of Nonhomogeneous Poisson processes using wavelets. In Proceedings of the 32nd conference on Winter simulation, pages 562–571. Society for Computer Simulation International, 2000.
- S. Lee, J.R. Wilson, and M.M. Crawford. Modeling and simulation of a nonhomogeneous Poisson process having cyclic behavior. Commun Stat Simulat, 20(2-3):777–809, 1991.
- Peter A Lewis and Gerald S Shedler. Simulation of nonhomogeneous poisson processes by thinning. Nav Res Log, 26(3):403–413, 1979.
- L.M. Linnett, D.R. Carmichael, and S.J. Clarke. Texture classification using a spatial-point process model. IEE P-Vis Image Sign, 142(1):1–6, 1995.
- R.D. Malmgren, D.B. Stouffer, A.E. Motter, and L.A.N. Amaral. A Poissonian explanation for heavy tails in e-mail communication. Proc. Natl. Acad. Sci. U.S.A., 105(47):18153–18158, 2008.
- W.A. Massey, G.A. Parker, and W. Whitt. Estimating the parameters of a nonhomogeneous Poisson process with linear rate. Telecommun Syst, 5(2):361–388, 1996.
- A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow clustering using machine learning techniques. In Lect Notes Comput Sc, pages 205–214. Springer, 2004.
- A.W. Moore and D. Zuev. Internet traffic classification using bayesian analysis techniques. In ACM SIGMETRICS, volume 33, pages 50–60. ACM, 2005.
- James E Muller, GH Tofler, and PH Stone. Circadian variation and triggers of onset of acute cardiovascular disease. Circulation, 79(4):733–743, 1989.
- Vern Paxson and Sally Floyd. Wide area traffic: the failure of Poisson modeling. IEEE/ACM Trans. Netw., 3(3):226–244, 1995.
- R.W. Peters, J.E. Muller, S. Goldstein, R. Byington, L.M. Friedman, BHAT Study Group, et al. Propranolol and the morning increase in the frequency of sudden cardiac death (BHAT Study). Am. J. Cardiol., 63(20):1518–1520, 1989.
- S.M. Ross et al. Stochastic processes, volume 2. John Wiley & Sons New York, 1996.
- M. Roughan, S. Sen, O. Spatscheck, and N. Duffield. Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification. In Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pages 135–148. ACM, 2004.
- D.C. Schmittlein, D.G. Morrison, and R. Colombo. Counting Your Customers: Who-Are They and What Will They Do Next? Manag. Sci., 33(1):1–24, 1987.
- G. Schwarz et al. Estimating the dimension of a model. Ann. Stat., 6(2):461–464, 1978.
- David W Scott, Richard A Tapia, and James R Thompson. Nonparametric probability density estimation by discrete maximum penalized-likelihood criteria. The annals of statistics, pages 820–832, 1980.
- W. Thompson. Point process models with applications to safety and reliability. Springer Science & Business Media, 2012.
- R.A. Waltz, J.L. Morales, J. Nocedal, and D. Orban. An interior algorithm for nonlinear optimization that combines line search and trust region steps. Math. Prog., 107(3):391–408, 2006.
- M. Zhao and M. Xie. On maximum likelihood estimation for a general non-homogeneous Poisson process. Scand J Stat, pages 597–607, 1996.