# New formulation of the Logistic-Normal process to analyze trajectory tracking data es

###### Abstract

Improved communication systems, shrinking battery sizes and the price drop of tracking devices have led to an increasing availability of trajectory tracking data. These data are often analyzed to understand animals behavior using mixture-type model. Due to their straightforward implementation and efficiency, hidden Markov models are generally used but they are based on assumptions that are rarely verified on real data.

In this work we propose a new model based on the Logistic-Normal process. Due to a new formalization and the way we specify the coregionalization matrix of the associated multivariate Gaussian process, we show that our model, differently from other proposals, is invariant with respect to the choice of the reference element and the ordering of the probability vectors components. We estimate the model under a Bayesian framework, using an approximation of the Gaussian process needed to avoid impractical computational time.

After a simulation study, where we show the ability of the model to retrieve the parameters used to simulate the data, the model is applied to the real data example, that motivated this work, where a wolf is observed before and after the procreation. Results are easy interpretable showing differences in the two time windows.

## 1 Introduction

Global Positioning System (GPS) telemetry currently represents the main tool to provide information about wildlife, allowing to remotely determine an animal position with high precision at time intervals programmed by the researcher [13]. The data take the form of a time series of coordinates and they are called trajectory tracking data. The large amount of data gathered from on-board GPS collars facilitate greater resolution in the study of habitat selection [33], spatiotemporal movements [for example 60, 22, 62, 20] and animal behaviour [58, 3]. In the last years GPS data have been increasingly used to identify behavioural states [72, 59, 61, 28], especially for cryptic species that inhabit remote environments and are difficult to observe in the field. In this framework, the investigation of behavioural phasing of wild-ranging animals is strictly linked to the analysis and modelling of movement characteristics such as the length and direction of animal steps, called step-length and turning-angle.

In this work we are going to model and understand the behaviour of a female wolf, observed in the the Abruzzo, Lazio and Molise National Park in the central Apennines, Italy [47]. We have data, recorded through a GPS device, in two time windows. In the first the movement are more erratic while in the second, since the wolf reproduced, her movements are more regular since she systematically revisited the den to feed and attend the pups, exibiting the classic star-shaped pattern [56]. Our main interest is to propose a model that can investigate the behaviour of the wolf in these two time windows and if/how it changes. Even if wolf behaviour have been analized by many authors in different contexts, see for example [31], [21] or [48], to the best of our knowledge, this is the first time that a single model have been used to compare behaviours in these two stages of a wolf life. Se ti viene in mente qualcosa di più metticela, su AOAS ci tengono al dataset e dobbiamo giustificare il più possibile il perchè vogliamo analizzarlo, anche se va detto che abbiamo un’introduzione lunghissima!!

Animal movement modelling has a long history. Starting from the diffusion model [11], a wide range of different approaches have been proposed, such as the Markov processes with diffusion and discrete components of [10], mixture of random walks [60], Brownian bridge [35], agent-based model [34], mechanistic approach [55] and the continuous-time discrete-space model [29].

Generally, the joint distribution of the coordinates, or an appropriate transformation, can be seen as a mixture process where the mixture components (or regimes) are the behaviors. The switching between regimes is generally assumed to be temporally structured [36, 60] and sometimes also spatially, as in [10], often ruled by a non-observed Markov process leading to the class of hidden Markov models (HMMs) [80]; for a recent review on animal movement analysis we refer the reader to [65].

Although HMMs are widely adopted, see for example [21], [42] and [50], the Markov structure is too restrictive and has no justification in terms of animal behavior. Consider this simple example: an animal behavior is described by two regimes, i.e., the resting and the feeding, and assume to have observations 30 minutes apart. If the animal is resting at a given time, the probability to move to the feeding behavior should depend on the time of the day, i.e, lower at dawn and higher at sunrise. In some works, see for example [64], [60] and [27], problems like these are tackled using covariates modelling probabilities, but not always these are available and moreover the switching between behaviors is a complex process and a richer model should be used instead. In our opinion the HMMs are so widely adopted since they are efficient and easy to implement thanks to the constant time-rate between consecutive observations. Extensions for continuous time, i.e. CT-HMM, are available, however they require a more complicated implementation, which increases the computational cost and, therefore, CT-HMM are not yet widespread used. For examples of algorithms working on CT-HMM, see [45] with applications in medical finance, [40] with applications in finance and [29] with applications in animal movement analysis.

We propose a mixture-type model, as the HMM, but with a higher level of flexibility, assuming that the probabilitiy vector, called also compositional vector or compositional datum, is distributed as a Logistic-Normal distribution () [1]. The was proposed by [1] as a distribution for compositional data that can be used as an alternative to the Dirichlet. Compositional data present some specific features, that will be discussed in details in Section 3.2, which make their analysis complicated. Beyond the obvious fact that compositional data are positive and sums to one, the structure of the simplex, which is the support of compositional vectors, imposes a constraint on one of the component, that we call reference element, and then elements are independently defined while one is obtained as a deterministic function of the others. There is no particular reason to choose any of the elements as the reference or a particular ordering of the probabilities meaining that inference based on the distribution should not depend on these choices.

The has a representation in terms of normally distributed variables. This variables are used to define Gaussian processes (GPs), which in turn induce a process on the probability vectors, and then dependence within the elements is induced by coregionalization [24]. This is not the first proposal that uses GPs to introduce dependence over a process, however the contribution of this work is to propose a new way to formalize the within and between dependence that generalises most of the previous proposals and allows us to perform inference that does not depend on the reference element or the ordering. The invariance properties follow directly from the coregionalization matrix and the covariance functions we define. Both points have always been overlooked in the literature, see for example [49], [77], [63], [70] and [12], or oversimplified hypothesis have been imposed, resulting in a reduced flexibility of the models, e.g. [49] and [77] assume that the GPs have the same covariance functions.

It is well know that models based on GPs suffer of a computational problem when the number of observations is large [7, 38], since the inversion of the covariance matrix requires a number of operations that is a function of , where is the number of observations. For this reason, we propose to estimate the model using the Nearest Neighbor GP (NNGP) approximation proposed by [16, 17], which has been proved to be computational efficient and produces estimates that are almost indistinguishable from the ones of the full-GP; for a general review on methods which can be used to analyze large dataset we refer the reader to [32].

We estimate the model under a Bayesian framework, proposing a Markov chain Monte Carlo (MCMC) algorithm straightforward to implement. The number of latent behaviors must be specified a priori in order to estimate the model and we use the integrate classification likelihood (ICL) [9] to select their number; Deviance Informational criteria (DICs), proposed by [14], have also been tested but none of them provided satisfactory results.

We show, through simulated examples, that the MCMC algorithm we propose, bases on the NNGP, is able to retrieve the parameters used to simulate the data and we show that the ICL identifies the right number of clusters in most cases. We also show that, if the computational time is an issue and/or the interest is only on the likelihood parameters estimates, a very efficient version of the model can be used.

The model is applied to the motivating example. Interestingly, the ICL find three regimes, one, that is the “slow-movement” behaviour, shared across the time windows and two, that can be both described as “hunting” and “exploring” behaviour, which are peculiar to only one. It is interesting that the temporal characterization of the slow-movement behaviour changes in the two time windows, and it is more likely in the daytime in the first and the first night hours in the second, results that is in line with previous literature [57]. The other two behaviours are similar in terms of step-length distribution, but differ in the way they change direction (i.e. turning-angle) since in the first the wolf tends to move in a straight line while in the second she moves with an anticlockwise pattern. We also show that our model outperforms the HMM and the proposal of [63] when estimated on this real data.

The paper is organized as follows: Section 2 presents the motivating example and a full description of the dataset that will be analyzed, Section 3 describes the proposed approach, Section 4 presents the results of the simulation study while in Section 5 can be seen the results of the model estimated on the real data. The paper ends with some remarks in Section 6.

## 2 The data

Start | End | Interval | Number of obs. | Missing |
---|---|---|---|---|

03/01 00:00 | 03/10 23:30 | 30 minutes | 473 | 7 |

05/28 00:00 | 06/16 09:00 | 3 hours | 138 | 18 |

We record a time series of spatial locations of the female wolf called F24 (2-3 years old), that was live-trapped in May 2009 in the Abruzzo, Lazio and Molise National Park (central Apennines, Italy) and equipped with a Vectronic Pro Light-1 collar (Vectronic Aerospace GmbH, Berlin, Germany). Details on wolf capture and handling are provided in [47]. During winter months (January â April), fix attempts were scheduled every 30 minutes for 10 days, and every 3 hours for 20 days for the rest of the year. The lower acquisition interval during winter months was programmed with the aim to estimate wolf kill rate though field investigations of GPS clusters [75].

When first captured, F24 was a member of the Villa pack, where it remained for 7.9 months before dispersing and establishing a new pack (i.e., Bisegna pack) in January 2010. The wolf reproduced in May 2010 and, using information derived from its GPS locations, we were able to determine the actual position of the den. Between May 28th and June 4th F24 restricted its movements in the proximity of the den. We therefore assumed that F24 entered the den on May 28th and reproduced in the following days. Until collarâs failure on June 16th, female F24 systematically revisited the den to feed and attend the pups. We collected locations from F24’s collar with a mean acquisition rate of 91.5%. We decided to use data only on the final period of the Bisegna pack which is composed of two separate time windows were locations are recorded. We restrict ourself to this subset of data because it represents the final and more stable phase of the tracking period and, moreover, we are mainly interested in the changes in behaviour before and after the wolf reproduction.

The two time windows used in this work are described in Table 1 while the trajectories are shown in Figure 2, where we can clearly see that in the first F24 has a more nomadic movements while the classic star-shaped pattern, that characterizes wolf reproductive period [56], can be observed in the second.

### 2.1 Preliminaries and notation

The two windows have different recording rate and there is a temporal gap between the two. For reasons that will be clear when the model is introduce, we assume a fixed rate of 30 minutes from the the starting time of the first observational window () to the ending of the second (), considering the locations at non-observed time as missing.

Let be temporal indices, assuming minutes, and , with , be the associated coordinates (or spatial locations). In animal behavior analysis, the standard approach considers time series describing the step-length, i.e. which is a proxy of the speed, and turning-angle, i.e. , as variables, quantities that can be computed from the coordinates, as described in Figure 1; notice that, due to missing observations, step-lengths and turning-angles cannot be computed for all temporal points. A different and equivalent way to describe the same path is to consider the vector of coordinates increments, having and . The whole trajectory can then be seen as

(1) |

or, equivalently,

(2) |

where is the rotation matrix based on the angle . The variable contains all the information needed to describe the trajectory without loosing any significant property. In particular, the sign of indicates if the animal is moving in the same direction of the previous time while the sign of shows how and if it turns.

## 3 The model

In a mixture-type model the clustering is generally encoded using a discrete (latent) random variable , where is a membership variable such that indicates the behavior at time . Notice that a change in behavior is identified only on the times belonging to . As shown in Equation (1), the variables and, equivalently, the step-lengths and turning-angles, are defined over two and three consecutive time points respectively; therefore, each must be defined on the same temporal interval such that elements in the same cluster follow the same distribution. This is the reason for the normalisation described at the beginning of Section 2.1.

The data is assumed to come from a mixture of normal distributions:

(3) | ||||

(4) |

Notice that a bivariate normal on induces a projected normal (PN) distribution on the turning-angle [79], that is one of the most flexible distribution for circular data [see for example 53, 52]. It can have one or two modes, be symmetric or highly skewed and antipodal. Unfortunately, no closed form for the step-length is available.

### 3.1 The logistic normal approach

The model given in equations (3) and (4) is completed with the specification of the joint distribution of . We assume that behaviors have temporal dependence that must be incorporated in the time evolution of . We start defining the following:

(5) |

where is the Kronecker delta function, , , i.e. is then a probability vector, such that each temporal point is characterized by its own vector . Our idea is to introduce temporal dependence between and , such that elements that are closer in time are similar and, as consequence, the correspondent ’s will tend to assume the same value.

The probability vector is defined with a logistic transformation of real valued variables

(6) |

Notice that adding a constant to each produces the same vector of probabilities and then an identifiability constraint is needed; without loss of generality, we set to zero the element.

We assume the vector to be distributed as a normal random variable. The vector is then distributed [1] with parameters that are given by those of .

To introduce temporal dependence between compositional vectors we envision has a realization of a time structured dimensional GP

(7) |

where is the dimensional identity matrix, is a vector of covariates, is a -dimensional vector of regression coefficients,
is a matrix and .
’s are independent realization of GPs with zero mean and correlation functions , i.e. , depending on some parameter . , in general, is set equal to the dimension of , i.e. , however for the moment we keep it general for reasons that will be clear in Section 3.3. Since is a realization of a GP, we can think of as the realization of a process. Dependence between the normally distributed variables induces dependence among the probability vectors.

The model (7) is called linear model of coregionalization [24] and assumes that cross-covariance functions arise as linear transformation of diagonal cross-correlation matrices.
The functional form of the dependence is presented in the next section while more details on how to construct and the processes s will be given in Section 3.3, where a new parametrization is introduced.

Equation (7) generalizes most of the models that have been proposed in the literature. For instance, the proposal of [51] is obtained by assuming . The model of [63] is obtained by letting be a spatio-temporal process with autoregressive temporal increments, and a diagonal matrix . Moreover, we can reduce to the proposals of [77], [49] and [68] assuming .

On the other hand, models such as the ones of [66], describing also the dependence among processes through a correlation function, i.e. using the cokriging, can not be expressed with this formulation. However, one may notice that the complexity of our approach is reduced with respect to the cokriging, since with our approach the total covariance structure is defined via the combination of the temporal covariance and the matrix , while in the general formulation of the cokriging covariance functions are necessary to describe the behavior of a regionalized composition and some properties about the covariance structure are often assumed to reduce the complexity of the problem.

Notice that [66], [43], [63] and, in general, all the proposals that worked with the GP representation of the probability vectors, specify only correlation functions, and in case of cokriging also the cross-correlations, as if the real focus of the inference is , that has dimension , and not , that has dimension .

### 3.2 Interpretation problems

The nature of compositional vectors, living on the simplex, makes the interpretability of dependence complicated since correlations are not free to vary in . This means that the sum-to-one constraint for compositional data not only imposes a limitation in the modelling due to the fact that an unbounded process cannot be used, but also induces negative correlations among variables, since

(8) |

where and, therefore,

(9) |

The left hand side is negative except for the trivial case when is a constant. This leads to a problem of interpretability.

[1] and following works pointed out that a more consistent measure of dependence between compositional elements can be measured as

(10) |

i.e. through the covariance between all possible combinations of log-ratios.

Each log-ratio may be defined as

(11) |

so that

(12) |

Even though indeces , , and are freely to change, it is generally assumed so that can be interpreted as the dependence between and with respet to .

Equation (12) highlights an important interpretational problem; the covariance structure described by assumes different forms if the reference element is involved. As an example, consider with , from (12) it is evident that this depends on the covariance function of , and their cross-covariance however, if one of the two indices, say , is equal to the reference element, here , we have and so depends only on the covariance of the process . This highlights that values of (10) involving the reference element have a different functional form, that must be taken into account when the covariance functions of the ’s are defined otherwise the covariance of the reference element is treated in a different way with respect to all the other elements.

Another way to understand the interpretability problem is to consider a with independent components in terms of log-ratios, i.e. for arbitrary and , i.e. complete independence. This is true iff the covariance matrix between and can be written as

(13) |

with . [1] also proved that the elements of are independent and identical distributed (i.i.d.) if the GP has zero-mean and

(14) |

where . Therefore independent ’s, i.e. diagonals and absence of temporal correlation, do not imply independence between ’s, that instead requires the covariance matrix (13). A similar result holds if we want the elements of the compositional vector to be i.i.d., since it is necessary that the associated normal distribution has zero mean and covariance matrix (14).

These problems highlight how the properties assumed for the GP, for instance in terms of dependence structure, are not automatically transferred to the elements of the compositional vectors, making more difficult the interpretation of the results in terms of , which should be the focus of the analysis.

### 3.3 A new parametrization

We now propose a different way to look at the process. This parametrization allows for an easier interpretation of the parameters, making possible to introduce dependence between compositional vectors, retaining the reordering invariance and allowing different covariance structures for the components.

We define a new variable

(15) |

where is defined as in (3), is a -dimensional process, is a matrix as in the standard coregionalization model, and are, respectively, a and dimensional vector. In (7) the covariance matrix of is .

We then define the compositional vector using , i.e.

(16) |

Indeed there is an identifiability problem in (17), that will be discussed later, due to the following relation:

(17) |

Notice that, if we assume we can define . The elements of (7) can be derived by (15) and we have then that and

(18) |

where is sub-matrix of the matrix , obtained by selecting from the -th to the -th rows and from the -th to the -th columns, and

(19) |

Our main idea is to work with the process , since is not identifiable, defining the covariance structure through the process , since it allows to be invariant to the reference element, and the way we are going to define matrix makes inference unaffected by the ordering given to the elements of .

To understand why, we can see from (17) that the log-ratio can be computed as

(20) |

Notice that, differently from (12), (20) has the same functional form even if one of the index is the reference elements since the Gaussian variable is not set to zero and, then, all the ’s have the same structure, without the asymmetry of equation (12). Now we show how properties of the process , such as independence or i.i.d. elements, are inherited by . For example if the components of are i.i.d., with zero-mean and variance , due to the relation (19), has zero mean and covariance matrix

which, assuming , is the covariance matrix in equation (14). If, on the other hand, we assume with , the covariance matrix between and has the form given in (13), if . This means that temporal independent ’s induce temporal independence ’s’ and i.i.d. elements of produce i.i.d. elements of . Notice that, due to (19), is not identifiable and then we impose the constraint , having then .

The specification of the process is complete specifying a structure for the matrix , which described the relationship among the behaviour. The usual choice is the Cholesky decomposition, however it is well known that this induces an ordering between the variables [73] that is against one of the principle of compositional data analysis. We than follow the approach of [54] by setting

(21) |

where is the matrix of eigenvectors of and is the diagonal matrix with the square root of the eigenvalues as elements. Equation (21) produces a dependence structure which is influenced neither by the ordering of the compositional vector elements nor by the ordering given to the eigenvalues.

In conclusion, to be able to define a model with the invariant properties needed to analyze compositional data, we suggest to use covariance functions defining matrix through transformation (18). Our proposed model is then

(22) | ||||

(23) | ||||

(24) | ||||

(25) | ||||

(26) |

### 3.4 Computational details

#### GP approximation

Computational issues often arise for models based on GPs. This is mainly due to the need to invert the covariance matrix, an operation of complexity proportional to the power three of the dimension of the problem [7]; in our context the dimension of the covariance matrix is . To be able to estimate the model, we make use of the novel approach of [16]. The authors propose a class of scalable NNGP which may be seen as a hierarchical sparse prior and allows for efficient MCMC algorithms to be performed without storing or decomposing large covariance matrices.

In particular, for a general vector of random variables , distributed as a multivariate normal, the NNGP approximate the joint density, written in terms of conditionals, i.e.

(27) |

with one based on smaller conditional sets, i.e.

(28) |

where , called the neighbor set of , contains only a subset of maximum elements from . [16] show that inference based on produces very similar results to a full GP modelling even for small values of , as , and the computational complexity is reduced by inducing sparsity in the NNGP process precision matrices.

The NNGP depends on the ordering given to the element of . In the literature have been proposed different alternatives, see for example [39] and [26] for some theoretical implications of the neighbor choice. Since the approximation is known to work better if contains observations that are highly correlated with [17], in a purely temporal process, the temporal ordering is the most natural choice since, generally, the correlation functions used decrease with the temporal distance. The NNGP is applied to the multivariate GP .

#### MCMC implementation

The MCMC implementation is straightforward. Given a value of the entire multivariate GP, its parameters can be simulated as in the usual GP framework. In details, we update all parameters at the same time using a Metropolis step with the adaptive proposal of [5], algorithm 4. Before applying the algorithm, it is necessary to transform the variables so that all of them belong to . We take the logarithm of the decay parameters and to eliminate the constraints over the parameters of the non-negative definite matrix , we re-express it using the Bartlett decomposition [4] that is based on random variables that are normally and chi-squared distributed; the latter are then transformed using the logarithm. Given the probabilities and the data , the parameters and the latent variable are simulated as in a mixture model using Gibbs steps while to simulate the GP elements we use the novel approach of [69] and its extension proposed in [44]. The missing observations are obtained by first simulating the missing elements of , using (1), and then computing .

## 4 Simulated examples

0.047 | 2.949 | -0.263 | 0.044 | 2.946 | -0.264 | |

(CI) | (-0.165 0.265) | (2.77 3.12) | (-0.436 -0.093) | (-0.164 0.278) | (2.757 3.116) | (-0.435 -0.100) |

-0.079 | -0.116 | -2.898 | -0.099 | -0.123 | -2.897 | |

(CI) | (-0.447 0.314) | (-0.356 0.125) | (-2.995 -2.798) | (-0.467 0.300) | (-0.364 0.116) | (-2.994 -2.790) |

1.439 | 1.271 | 2.073 | 1.445 | 1.272 | 2.076 | |

(CI) | (1.062 1.858) | (1.008 1.634) | (1.755 2.441) | (1.074 1.887) | (1.012 1.620) | (1.765 2.427) |

-0.012 | 1.465 | -0.536 | -0.004 | 1.464 | -0.539 | |

(CI) | (-0.384 0.360) | (1.155 1.871) | (-0.696 -0.385) | (-0.373 0.356) | (1.15 1.84) | (-0.705 -0.389) |

3.305 | 2.346 | 0.634 | 3.337 | 2.351 | 0.635 | |

(CI) | (2.525 4.119) | (1.892 2.936) | (0.526 0.772 ) | (2.520 4.181) | (1.887 2.914) | (0.522 0.771) |

0.559 | 2.1 | 0.579 | 2.205 | |||

(CI) | (-2.163 3.329) | (0.586 3.670) | () | (-1.955 3.168) | (0.768 3.721) | () |

-2.37 | -4.964 | -2.508 | -5.09 | |||

(CI) | (-6.647 2.077) | (-7.564 -2.543) | () | (-6.743 1.463) | (-7.724 -2.649) | () |

0.592 | 1.419 | 2.912 | 0.987 | 2.247 | 2.89 | |

(CI) | (0.313 1.441) | (0.382 3.836) | (1.052 5.071) | (0.563 1.637) | (0.429 5.361) | (0.651 5.208) |

3.159 | 0.795 | 1.144 | 3.889 | 0.78 | 1.979 | |

(CI) | (0.983 7.651) | (0.183 2.295) | (0.279 3.384) | (0.358 8.784) | (0.143 3.579) | (0.549 5.001) |

-0.658 | -0.131 | 0.006 | -0.118 | 0.557 | 0.338 | |

(CI) | (-1.689 0.185) | (-1.164 1.716) | (-0.592 1.160) | (-1.076 1.480) | (-1.299 3.684) | (-0.891 3.150) |

1 | 10 | 20 | Tot | |
---|---|---|---|---|

250 | 24 | 36 | 34 | 94 |

500 | 22 | 33 | 44 | 99 |

1000 | 19 | 22 | 59 | 100 |

1 | 10 | 20 | |
---|---|---|---|

250 | 1.35 | 3.15 | 5.25 |

500 | 3 | 6.6 | 10.2 |

1000 | 5.1 | 13.2 | 16.5 |

In this Section we aim to show that the NNGP approximation, applied to our model, can estimate the parameters in a satisfactory way and, moreover, we want to describe a method to make inference on the number of latent classes .

We simulate data with , a number of observations and assuming a regular observational time lag for all , i.e. for all we have the same time length but as increases the intervals between observations decrease, with

(29) |

(30) |

i.e., correlation between the two components of is equal to zero in the first behavior, in the second and in the third.

We also assume

and we use exponential correlation functions with decay parameters equal to , and respectively.

The regressive coefficients are where is a vector of dimension having 1 as first element (intercept) and the element of the second column equal to . For each we simulate 100 datasets.

We perform inference by fixing to values between two and six and in . The choice of the number of components is essential in mixture literature and in application involving mixtures. To deal with this problem, several proposals are available in the literature, e.g. there exist methods which may be performed in the MCMC algorithms employed, as in [71], however the most common method is to use one or more model selection criteria. Of course, a competitive approach would be considering a random variable itself and defining a prior distribution on it; see, for instance, [23] or [8], for a recent nonparametric approach. Nevertheless, since the choice of the number of hidden states is not the main goal of this work, we leave this for further research.

Since the model is defined via a latent variable we use the integrated classification likelihood (ICL) proposed by [9]; the results are reported in Table 5. We also tried to use the DICs proposed in [14], but none of them gave satisfactory results in term of selecting the right number of clusters; DICs select the right between 60% and 70% of the times.

As prior distributions we assume and for the likelihood parameters. For the temporal decays we assume , The regressive coefficients are normally distributed with mean e variance while . The MCMC is implemented with 1000000 iterations, burnin 70000 and thin 6, having then 5000 posterior samples.

As we can see from Table 5, using ICL, is selected 94% of the time with , with and if . Moreover, as increases, it is more likely to prefer the model with higher . To give a better insight of this result, we selected randomly one of the 100 datasets that has and we show the parameter estimates obtained with 1 and 20 in Table 2, while in Figure 4 posterior estimates of the compositional vectors time series, with the the associated 95% credible intervales (CIs), are depicted for all three behaviors and ; the “true” compositional vectors time series, trajectory and are shown in Figure 5.

From Figure 5 and Table 2 we see that the results obtained with and are quite similar. The compositional vectors time series are almost identical with few minor differences, as well as posterior means and CIs of likelihood parameters. The model with has a worst performance in terms of parameters estimate of the GP since the diagonal elements of are not estimated correctly^{1}^{1}1We consider correctly estimated a parameters if the true value is inside the CI.. In both cases most of the parameters are correctly estimated. This similarity between the two models based on different values of can explain, in our opinion, why even with a sample size of 1000, models with are often chosen.

We then believe that if computational time is an issue, or if we are really only interested in the likelihood parameters, model with could be used. The computational time can be seen in Table 4.

## 5 Real data examples

We now present the application of the proposed method to the dataset described in Section 2.

Covariate information is not available, however, in order to work with models with (possible) different mean values in the two observational time windows, we set to be a vector with equal to 1 if belongs to the first time windows and 0 otherwise, while equal to 1 if belongs to the second time windows and 0 otherwise. Notice that both variables are equal to 0 in the time between the two. As in the simulated example, we use exponential correlation functions and the same prior distributions, testing models with and .

On the same dataset an HMM, i.e a model that assumes the following

(31) |

where and is a compositional vector, and the proposal of [63], that is obtained assuming to be a dimensional diagonal matrix, were also tested and the model performance are then compared using ICL. As for our proposal, we tested models with and the same NNGP approximation, with , is used for the proposal of [63]. As prior distributions we assume Dirichlet with vectors of parameters equal to for the ’s, inverse gamma with parameters 1 and 0.5 for the variance parameters of [63] while the other parameters have the same priors of our proposal.

For all models we use the same number of iterations, thin and burnin used in the simulated examples.

### 5.1 The results

Model | ||||||
---|---|---|---|---|---|---|

HMM | -12607 | -14926 | -14366 | -10941 | -9939 | |

-10784 | -10958 | -9878 | -9256 | -9123 | ||

Logit1 | -11139 | -12421 | -11089 | -10115 | -9890 | |

-11255 | -11424 | -11650 | -9738 | -9542 | ||

-11532 | -14345 | -12191 | -9510 | -9205 | ||

Logit2 | -12794 | -16001 | -13315 | -9190 | -8899 | |

-11917 | -15676 | -14798 | -11280 | -9991 |

-0.011 | 0.386 | -0.544 | |

(CI) | (-0.017 -0.004) | (0.248 0.531) | (-1.410 0.081) |

0.002 | 0.061 | 0.049 | |

(CI) | (-0.005 0.008) | (-0.070 0.199) | (-1.291 1.235) |

0.003 | 0.661 | 0.38 | |

(CI) | (0.003 0.004) | (0.531 0.819) | (0.166 0.635) |

0 | 0.052 | 0 | |

(CI) | (0 0) | (-0.044 0.153) | (-0.031 0.030) |

0.003 | 0.513 | 0.38 | |

(CI) | (0.003 0.004) | (0.403 0.637) | (0.169 0.639) |

7.394 | 4.242 | ||

(CI) | (6.764 8.94) | (1.848 6.932 ) | () |

7.012 | -3.227 | ||

(CI) | (5.885 7.544) | (-7.098 -0.543) | () |

6.745 | 12.266 | 5.203 | |

(CI) | (4.882 9.189) | (6.135 14.861) | (0.909 11.180) |

143.95 | 1.785 | 1.731 | |

(CI) | (40.475 371.745) | (0.221 8.715) | (0.027 11.549) |

-4.418 | -5.877 | 1.408 | |

(CI) | (-38.881 20.888) | (-45.929 12.702) | (0.068 9.363) |

In Table 5 we can see the ICL for all the tested models. All of them suggests and our proposal with is the one with the lower ICL, i.e. it is the model with the best fit. It is also interesting to note that, for any give , our model outperforms the others.

The posterior estimates of the chosen model can be seen in Table 6, in Figures 5 and 6 can be found the posterior estimates of the probability vectors time series while Figure 7 shows the observed spatial locations with the associate classification and predictive densities of the step-length and turning-angle.

### 5.2 Behaviour description

#### First Behaviour

From Figure 7 (c) and (d) we see that in the first behaviour the speed is really close to zero and the circular distribution, even if has a mode at around , has a lot of variability showing that there is not a clear preferred direction. The two regressive coefficients, Table 7, are higher with respect to the reference element (the third) and the second behaviour, indicating that this is the one with higher probabilities, as we can also see from Figures 5 and 6 where the probability values are often equal to 1. This behavioiur is the slow-movement behaviour, representing a variety of activities such as resting, feeding, social interacting and, in the second window, attending pups during the reproduction period. Notice that the occurences of this behaviour, in the second time window, are spatialy localized in a relatively small area, that we think is the den, see Figure 7 (b).

#### Second Behaviour

In the second behaviour the speed increases and the circular distribution has a clear mode around zero, indicating that the wolf tends to move in straight line, see Figure 7. This behaviour is relevant in the first time window while in the second it almost disappears, as we can also see from the the associated regressive coefficients. This behaviour fits with the nomadic phase of wolf movement patterns during winter, when the main activities are hunting and patrolling the territory [56]. Moreover, F24 established her home range in March 2010, and these high speeds may also represent the need to control and mark the territory, as newly formed pairs are the ones with the highest marking rates in wolf populations [74].

#### Third Behaviour

In the last behaviour, that has really low probability in the first time window, the predictive distribution of the step-length is similar to the one of the second (Figure 7). The turning-angle has a mode at and low variability, indicating that the animal moves in a anticlockwise direction.
Given that this behaviour is almost absent during the first temporal window, whereas it represents the main moving type during the second one, this pattern seems to correctly fit with the star-shaped movements of wolves in presence of pups at dens [56]. This is in line with the tendency of breeding females to restrict their movements to a smaller area of the territory during the period of reproduction compared to the rest of the year [37]. The counter clockwise tendency of wolf movements may be related to the necessity to exploit different portions of the home range to locate vulnerable prey. Because wolves apparently have a spatial map of resources within their territory [67], varying their hunting routes to surprise prey could improve their hunting success [37], resulting in a rotational use of the home range [18].

From the extradiagonal elements of we can see that there could a dependence between the second and third behaviour; remember that dependence must be evaluated using the log-ratios, that involve three elements. To better analize the dependence structure we plot a log-ratio temporal correlation in Figure 8. There is not a unique and generally accepted way to evaluate the correlation of compositional data, see for example [19] or [46], but since we are here interested mainly in the temporal evolution of the dependence, the log-ratio is divided by its value at time lag 0, i.e. , showing then how dependence changes over time. We indicate the correlation coefficients with .

It is interesting to note that and , that are the correlation functions of the second and third behaviours with respect to the first, are indistinguishable, higlighting that the difference is mostly on the direction of movement (the turning-angle). In Figure 8 (b), which shows the cross-correlations, only behaviour two and three () has CI that does not contain the zero. It is interesting to note that dependence disappear after 12 hours, i.e. all values are close to zero.

### 5.3 Time window description

#### First time window

In the first window, the slow-movement behaviour (the first one) has high probability during daylight hours, wheras the second during night (Figures 5 (a) and (b)). This complementary pattern is in line with the circadian activity of wolves in human modified environments, where they are mainly nocturnal to avoid disturbance derived from human activities during the day [76, 78, 15, 41].

#### Second time window

In the second window the slow-movement regime has probability close to 1 during the first days because F24 likely entered in the den and reproduced in that time. According to previous research on wolf reproducing behaviour, breeding females are stationary the day of reproduction, and with limited movements during the period following reproduction [2]. During the days after reproduction occurred, the slow-movement regime is often concentrated around dusk and during the first night hours (Figure 6 (a)), whereas the star-shaped moving regime is concentrated during daylight hours or shows two or more peaks at different times of the day (Figure 6 (c)). This result can be interpreted as a reduction of the nocturnal activity of this wolf due to the presence of pups, that is also accompanied with a relative increase in diurnality. During the reproduction period breeding females spend most of of their time at den and rendez vous sites [6, 30]. Because other wolves from the pack usually assure the feeding of breeding females during this time [57], females do not have to maintain an activity pattern based on hunting that, in our study area, can be nocturnal due to human presence. This situation may have lead F24 to leave the den mainly during the day, when sunlight can help in keeping the unattended pups warm and other large carnivores (such as Apennine brown bears in our study area) are less active [78].

## 6 Final remark

Motivating by our real data example, where a female wolf is observed in two time windows, i.e. before and after she reproduced, we have proposed a novel approach to analyse tracking trajectory data. This approach aims at defining the posterior distribution of the clustering probabilities, where the clusters are representive of different behaviors that the animal exibiths, e.g. resting, hunting or boundary controll, and describing the trajectory conditionally on the particular behavior, by characterising the step-length and the turning angle of the movement.

Our model is based on a GP representation of the process. However, as all the proposals which try to model compositional vectors through a Gaussian representation, some difficulties arise, in particular due to the sum-to-one constraint of the elements of a compositional vector. We have then proposed to perform the analysis by defining the covariance structure on an unidentifiable process, which allows us to transport the properties of the GPs to the elements of the probability vectors. Our proposal has the invariance properties that allow inference that is unaffected from the reference element chosen and the elements ordering.

The model is estimated under a Bayesian framework and to avoid possible computational problems, we proposed an MCMC based on an approximation of the multivariate GP. The number of latent behaviours is estimated using ICL, one of the most used informational criteria.

With a simulation study we show that the propose MCMC algorithm and approximation can recover the parameters used to simulate the data and ICL selects, in most of the cases, the right number of regime. Due to the results obtained, we argue that if the computational time is an issue, and if the interest lies only on the likelihood parameters, an efficient version of our model can be implemented.

We then estimate the model on the real data, that is the motivating example under this work. The female wolf completely changes her behaviors after having cubs, however the model recognises the disappearance of one behavior in the second time windows and the appereance of a new one, a feature that with competitive models, such as the HMMs, is hardly justified. Intermediate situations, as well as the ones considered in the simulation study in Section 4, can be also analysed in a satisfactory way; for an example see [25]. The results we obtained are easy interpretable and give a better insight of the wolf behaviour, both in terms of movement metrics, i.e. step-length and turning-angle, and the time evolution of the behaviour. Interestingly our model recognizes that the slow-movement behaviour, when the wolf has no cubs, is more likely to appear during the daytime, while after having the cubs, it switches to the first hours of the night.

The model have been proposed in the particular case of tracking trajectories, but it can be employed in different contexts, in particular in the case of environmental sciences, where spatial information can be incorporated in the probability dependence structure, so that the response variable behaves in a similar way in locations which are close in space. This will be the focus on further research.

Another issue that will be investigate in the future, is the case of an unknown number of clusters, i.e. behaviors. For the moment, we have decided to fix the number of components at different values and then use measures of model choice to fix this number but a fully Bayesian analysis, where a prior distribution is defined for the number of components, can be also investigate.

Other possible ways to extend our model is to consider more than one animal at the time, moving form an univariate approach to a multivariate one, or to incorporate in the dependence structure a seasonal components, that can take into account daily patterns.

## References

- Aitchison [1986] Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapman & Hall, Ltd., London, UK.
- Alfredeen [2006] Alfredeen, A. (2006). Denning behavior and movement pattern during summer of wolves, canis lupus. Grimso Institute of Wildlife Biology.
- Anderson and Lindzey [2003] Anderson, C. R. and Lindzey, F. G. (2003). Estimating cougar predation rates from gps location clusters. The Journal of Wildlife Management, 67(2), 307–316.
- Anderson [2003] Anderson, T. (2003). An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Statistics. Wiley.
- Andrieu and Thoms [2008] Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive mcmc. Statistical Computing, 18, 343–373.
- Ballard et al. [1991] Ballard, W. B., Ayres, L. A., Gardner, C. L., and Foster, J. W. (1991). Den site activity patterns of gray wolves, canis lupus, in southcentral alaska. Canadian Field-Naturalist, 105(4), 497–504.
- Banerjee et al. [2014] Banerjee, S., Gelfand, A. E., and Carlin, B. P. (2014). Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC, New York, second edition.
- Bianchini et al. [2017] Bianchini, I., Guglielmi, A., and Quintana, F. A. (2017). Determinantal point process mixtures via spectral density approach. arXiv preprint arXiv:1705.05181.
- Biernacki et al. [2000] Biernacki, C., Celeux, G., and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated classification likelihood. IEEE Trans. Pattern Anal. Mach. Intell., 22(7), 719–725.
- Blackwell [2003] Blackwell, P. G. (2003). Bayesian inference for markov processes with diffusion and discrete components. Biometrika, 90(3), 613–627.
- Brownlee [1912] Brownlee, J. (1912). The mathematical theory of random migration and epidemic distribution. Proceedings of the Royal Society of Edinburgh, 31, 262–289.
- Brunsdon and Smith [1998] Brunsdon, T. M. and Smith, T. (1998). The time series analysis of compositional data. Journal of Official Statistics, 14(3), 237.
- Cagnacci et al. [2010] Cagnacci, F., Boitani, L., Powell, R. A., and Boyce, M. S. (2010). Animal ecology meets gps-based radiotelemetry: a perfect storm of opportunities and challenges. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 365(1550), 2157–2162.
- Celeux et al. [2006] Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M., Futurs, I., and Rhône-alpes, I. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 4, 651–674.
- Ciucci et al. [1997] Ciucci, P., Boitani, L., Francisci, F., and Andreoli, G. (1997). Home range, activity and movements of a wolf pack in central italy. Journal of Zoology, 243(4), 803–819.
- Datta et al. [2016a] Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016a). Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111(514), 800–812.
- Datta et al. [2016b] Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A. S., and Schaap, M. (2016b). Nonseparable dynamic nearest neighbor gaussian process models for large spatio-temporal data with an application to particulate matter analysis. Annals of Applied Statistics, 10(3), 1286–1316.
- Demma and Mech [2011] Demma, D. J. and Mech, L. D. (2011). Accuracy of estimating wolf summer territories by daytime locations. The American Midland Naturalist, 165(2), 436–445.
- Filzmoser and Hron [2008] Filzmoser, P. and Hron, K. (2008). Correlation analysis for compositional data. Mathematical Geosciences, 41(8), 905.
- Frair et al. [2010] Frair, J. L., Fieberg, J., Hebblewhite, M., Cagnacci, F., DeCesare, N. J., and Pedrotti, L. (2010). Resolving issues of imprecise and habitat-biased locations in ecological analyses using gps telemetry data. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 365(1550), 2187–2200.
- Franke et al. [2006] Franke, A., Caelli, T., Kuzyk, G., and Hudson, R. J. (2006). Prediction of wolf (canis lupus) kill-sites using hidden markov models. Ecological Modelling, 197(1âÂÂ2), 237 – 246.
- Fryxell et al. [2008] Fryxell, J. M., Hazell, M., Börger, L., Dalziel, B. D., Haydon, D. T., Morales, J. M., McIntosh, T., and Rosatte, R. C. (2008). Multiple movement modes by large herbivores at multiple spatiotemporal scales. Proceedings of the National Academy of Sciences, 105(49), 19114–19119.
- Gassiat et al. [2014] Gassiat, E., Rousseau, J., et al. (2014). About the posterior distribution in hidden markov models with unknown number of states. Bernoulli, 20(4), 2039–2075.
- Gelfand et al. [2010] Gelfand, A., Diggle, P., Fuentes, M., and Guttorp, P. (2010). Handbook of Spatial Statistics. Chapman and Hall.
- Grazian et al. [2018] Grazian, C., Mastrantonio, G., and Bibbona, E. (2018). Introducing spatio-temporal dependence in clustering: from a parametric to a nonparametric approach. Proceedings of the XLIX Scientific Meeting of the Italian Statistical Society.
- Guinness [2018] Guinness, J. (2018). Permutation and grouping methods for sharpening gaussian process approximations. Technometrics, 0(0), 1–15.
- Gupta et al. [2007] Gupta, M., Qu, P., and Ibrahim, J. G. (2007). A temporal hidden markov regression model for the analysis of gene regulatory networks. Biostatistics, 8(4), 805–820.
- Gurarie et al. [2016] Gurarie, E., Bracis, C., Delgado, M., Meckley, T. D., Kojola, I., and Wagner, C. M. (2016). What is the animal doing? tools for exploring behavioural structure in animal movements. Journal of Animal Ecology, 85(1), 69–84.
- Hanks et al. [2015] Hanks, E. M., Hooten, M. B., and Alldredge, M. W. (2015). Continuous-time discrete-space models for animal movement. Annals of Applied Statistics, 9(1), 145–165.
- Harrington and Mech [1982] Harrington, F. H. and Mech, L. D. (1982). An analysis of howling response parameters useful for wolf pack censusing. 46(3), 686–693.
- Hayes and Harestad [2000] Hayes, R. D. and Harestad, A. S. (2000). Wolf functional response and regulation of moose in the yukon. Canadian Journal of Zoology, 78(1), 60–66.
- Heaton et al. [2017] Heaton, M. J., Datta, A., Finley, A., Furrer, R., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., Lindgren, F., Nychka, D. W., Sun, F., and Zammit-Mangion, A. (2017). A Case Study Competition Among Methods for Analyzing Large Spatial Data. ArXiv e-prints.
- Hebblewhite and Merrill [2008] Hebblewhite, M. and Merrill, E. (2008). Modelling wildlife and uman relationships for social species with mixed-effects resource selection models. Journal of Applied Ecology, 45(3), 834–844.
- Hooten et al. [2010] Hooten, M., Johnson, D., Hanks, E., and Lowry, J. (2010). Agent-based inference for animal movement and selection. Journal of Agricultural, Biological, and Environmental Statistics, 15(4), 523–538.
- Horne et al. [2007] Horne, J. S., Garton, E. O., Krone, S. M., and Lewis, J. S. (2007). Analyzing animal movements using brownian brigdes. Ecology, 88(9), 2354–2363.
- Houston and Mcnamara [1999] Houston, A. I. and Mcnamara, J. M. (1999). Models of Adaptive Behaviour: An approach based on state. Cambridge University Press, Cambridge, United Kingdom.
- Jedrzejewski et al. [2001] Jedrzejewski, W., Schmidt, K., Theuerkauf, J., Jedrzejewska, B., and Okarma, H. (2001). Daily movements and territory use by radio-collared wolves (canis lupus) in bialowieza primeval forest in poland. Canadian Journal of Zoology, 79(11), 1993–2004.
- Jona Lasinio et al. [2013] Jona Lasinio, G., Mastrantonio, G., and Pollice, A. (2013). Discussing the “big n problem”. Statistical Methods and Applications, 22(1), 97–112.
- Katzfuss and Guinness [2017] Katzfuss, M. and Guinness, J. (2017). A general framework for Vecchia approximations of Gaussian processes. ArXiv e-prints.
- Krishnamurthy et al. [2016] Krishnamurthy, V., Leoff, E., and Sass, J. (2016). Filterbased stochastic volatility in continuous-time hidden markov models. Econometrics and Statistics.
- Kusak et al. [2005] Kusak, J., Skrbinšek, A. M., and Huber, D. (2005). Home ranges, movements, and activity of wolves (canis lupus) in the dalmatian part of dinarids, croatia. European Journal of Wildlife Research, 51(4), 254–262.
- Langrock et al. [2012] Langrock, R., King, R., Matthiopoulos, J., Thomas, L., Fortin, D., and Morales, J. M. (2012). Flexible and practical modeling of animal telemetry data: hidden Markov models and extensions. Ecology, 93(11), 2336–2342.
- Lark and Bishop [2007] Lark, R. M. and Bishop, T. F. A. (2007). Cokriging particle size fractions of the soil. European Journal of Soil Science, 58(3), 763–774.
- Linderman et al. [2015] Linderman, S. W., Johnson, M. J., and Adams, R. P. (2015). Dependent multinomial models made easy: stick breaking with the Polya-Gamma augmentation. ArXiv e-prints.
- Liu et al. [2017] Liu, Y. Y., Moreno, A., Li, S., Li, F., Song, L., and Rehg, J. M. (2017). Learning continuous-time hidden markov models for event data. In Mobile Health, pages 361–387. Springer.
- Long and Wang [2013] Long, W. and Wang, Q. (2013). Two methods of correlation coefficient on compositional data. Procedia Computer Science, 18, 1757 – 1763. 2013 International Conference on Computational Science.
- Mancinelli et al. [pear] Mancinelli, S., Boitani, L., and Ciucci, P. (To appear). Determinants of home range size and space use patterns in a protected wolf (canis lupus) population in the central apennines, italy. Canadian Journal of Zoology, 0(0), 1–11.
- Marshall-Pescini et al. [2017] Marshall-Pescini, S., Cafazzo, S., Viranyi, Z., and Range, F. (2017). Integrating social ecology in explanations of wolf-dog behavioral differences. Current Opinion in Behavioral Sciences, 16, 80 – 86. Comparative cognition.
- Martins et al. [2016] Martins, A. B. T., Bonat, W. H., and Jr., P. J. R. (2016). Likelihood analysis for a class of spatial geostatistical compositional models. Spatial Statistics, 17, 121 – 130.
- Maruotti et al. [2015] Maruotti, A., Punzo, A., Mastrantonio, G., and Lagona., F. (2015). A time-dependent extension of the projected normal regression model for longitudinal circular data based on a hidden Markov heterogeneity structure. Stochastic Environmental Research and Risk Assessment, To appear.
- Maruotti et al. [pear] Maruotti, A., Bulla, J., Lagona, F., Picone, M., and Martella, F. (to appear). Dynamic mixture of factor analyzers to characterize multivariate air pollutant exposures. Annals of Applied Statistics.
- Mastrantonio et al. [2015a] Mastrantonio, G., Maruotti, A., and Jona Lasinio, G. (2015a). Bayesian hidden Markov modelling using circular-linear general projected normal distribution. Environmetrics, 26, 145–158.
- Mastrantonio et al. [2015b] Mastrantonio, G., Jona Lasinio, G., and Gelfand, A. E. (2015b). Spatio-temporal circular models with non-separable covariance structure. TEST, To appear.
- Mastrantonio et al. [2018] Mastrantonio, G., Jona Lasinio, G., Pollice, A., Capotorti, G., Teodonio, L., Genova, G., and Blasi, C. (2018). A hierarchical multivariate spatio-temporal model for large clustered climate data with annual cycles. ArXiv e-prints.
- McClintock et al. [2012] McClintock, B. T., King, R., Thomas, L., Matthiopoulos, J., McConnell, B. J., and Morales, J. M. (2012). A general discrete-time modeling framework for animal movement using multistate random walks. Ecological Monographs, 82(3), 335–349.
- Mech and Boitani [2003] Mech, D. and Boitani, L. (2003). Wolves: behavior, ecology and conservation. University of Chicago Press, Chicago.
- Mech et al. [1999] Mech, L. D., Wolf, P. C., and Packard, J. M. (1999). Regurgitative food transfer among wild wolves. Canadian Journal of Zoology, 77(8), 1192–1195.
- Merrill and David Mech [2000] Merrill, S. B. and David Mech, L. (2000). Details of extensive movements by minnesota wolves (canis lupus). The American Midland Naturalist, 144(2), 428–433.
- Moorter et al. [2010] Moorter, B., Visscher, D. R., L., J. C., Frair, J. L., and H., M. E. (2010). Identifying movement states from location data using cluster analysis. The Journal of Wildlife Management, 74(3), 588–594.
- Morales et al. [2004] Morales, J. M., Haydon, D. T., Frair, J., Holsinger, K. E., and Fryxell, J. M. (2004). Extracting more out of relocation data: building movement models as mixtures of random walks. Ecology, 85(9), 2436–2445.
- Nams [2014] Nams, V. O. (2014). Combining animal movements and behavioural data to detect behavioural states. Ecology Letters, 17(10), 1228–1237.
- Nathan et al. [2008] Nathan, R., Getz, W. M., Revilla, E., Holyoak, M., Kadmon, R., Saltz, D., and Smouse, P. E. (2008). A movement ecology paradigm for unifying organismal movement research. Proceedings of the National Academy of Sciences, 105(49), 19052–19059.
- Paci and Finazzi [2017] Paci, L. and Finazzi, F. (2017). Dynamic model-based clustering for spatio-temporal data. Statistics and Computing, pages 1–16.
- Patterson et al. [2009] Patterson, T. A., Basson, M., Bravington, M. V., and Gunn, J. S. (2009). Classifying movement behaviour in relation to environmental conditions using hidden markov models. Journal of Animal Ecology, 78(6), 1113–1123.
- Patterson et al. [2017] Patterson, T. A., Parton, A., Langrock, R., Blackwell, P. G., Thomas, L., and King, R. (2017). Statistical modelling of individual animal movement: an overview of key methods and a discussion of practical challenges. AStA Advances in Statistical Analysis, 101(4), 399–438.
- Pawlowsky and Burger [1992] Pawlowsky, V. and Burger, H. (1992). Spatial structure analysis of regionalized compositions. Mathematical Geology, 24(6), 675–691.
- Peters [1979] Peters, R. (1979). Mental maps in wolf territoriality. In E. Klinghammer, editor, The behavior and ecology of wolves, pages 119–152. Garland STPM Press, New York.
- Pirzamanbein et al. [2015] Pirzamanbein, B., Lindström, J., Poska, A., and Gaillard, M.-J. (2015). Modelling Spatial Compositional Data: Reconstructions of past land cover and uncertainties. ArXiv e-prints.
- Polson et al. [2013] Polson, N. G., Scott, J. G., and Windle, J. (2013). Bayesian inference for logistic models using polya-gamma latent variables. Journal of the American Statistical Association, 108(504), 1339–1349.
- Quintana and West [1988] Quintana, J. M. and West, M. (1988). Time series analysis of compositional data. Bayesian Statistics, 3, 747–756.
- Richardson and Green [1997] Richardson, S. and Green, P. J. (1997). On bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: series B (statistical methodology), 59(4), 731–792.
- Roever et al. [2010] Roever, C. L., Boyce, M. S., and Stenhouse, G. B. (2010). Grizzly bear movements relative to roads: application of step selection functions. Ecography, 33(6), 1113–1122.
- Rothman et al. [2010] Rothman, A. J., Levina, E., and Zhu, J. (2010). A new approach to cholesky-based covariance regularization in high dimensions. Biometrika, 97(3), 539–550.
- Rothman and Mech [1979] Rothman, R. J. and Mech, L. D. (1979). Scent-marking in lone wolves and newly formed pairs. Animal Behaviour, 27, 750 – 760.
- Sand et al. [2010] Sand, H., Zimmermann, B., Wabakken, P., Andren, H., and Pedersen, H. C. (2010). Using gps technology and gis cluster analyses to estimate kill rates in wolf-ungulate ecosystems. Wildlife Society Bulletin, 33(3).
- Theuerkauf [2009] Theuerkauf, J. (2009). What drives wolves: fear or hunger? humans, diet, climate and wolf activity patterns. Ethology, 115(7), 649–657.
- Tjelmeland and Lund [2003] Tjelmeland, H. and Lund, K. V. (2003). Bayesian modelling of spatial compositional data. Journal of Applied Statistics, 30(1), 87–100.
- Vila [1995] Vila, C. (1995). Observations on the daily activity patterns in the iberian wolf. Ecology and conservation of wolves in a changing world, 35.
- Wang and Gelfand [2013] Wang, F. and Gelfand, A. E. (2013). Directional data analysis under the general projected normal distribution. Statistical Methodology, 10(1), 113–127.
- Zucchini and MacDonald [2009] Zucchini, W. and MacDonald, I. (2009). Hidden Markov Models for Time Series: An Introduction Using R. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis.