1 Introduction

Summary statistics for inhomogeneous marked point processes

[0.2in]

O. Cronie and M.N.M. van Lieshout

[0.1in]

CWI, P.O. Box 94079, NL-1090 GB Amsterdam, The Netherlands

Abstract: We propose new summary statistics for intensity-reweighted moment stationary marked point processes with particular emphasis on discrete marks. The new statistics are based on the -point correlation functions and reduce to cross - and –functions when stationarity holds. We explore the relationships between the various functions and discuss their explicit forms under specific model assumptions. We derive ratio-unbiased minus sampling estimators for our statistics and illustrate their use on a data set of wildfires.

Key words: Generating functional, Intensity-reweighted moment stationarity, -function, Marked point process, Multivariate point process, Nearest neighbour distance distance distribution function, -point correlation function, Reduced Palm measure.

Mathematics Subject Classification: 60G55, 60D05.

1 Introduction

The analysis of a marked point pattern typically begins with computing some summary statistics which may be used to find specific structures in the data and suggest suitable models [4, 6, 7, 10, 13, 14]. The choice of summary characteristic depends both on the pattern at hand and on the feature or hypothesis of interest. Indeed, under the working assumption of stationarity, for discrete marks, cross versions of the - or nearest neighbour distance distribution function may be appropriate [8]; for real-valued marks, the mark correlation functions of [22] are widely used. Various types of -functions [15, 18] offer useful alternatives.

Often, however, the assumption of homogeneity cannot be justified. In the unmarked case, [1] proposed an inhomogeneous extension of the -function for so-called second order intensity-reweighted stationary point processes. Their ideas were extended to spatio-temporal point processes in [9, 20], whereas [5, 16] extended the -function under the somewhat stronger assumption of intensity-reweighted moment stationarity in space and time.

For non-stationary multivariate point processes, [21] proposed an extension of the -function under the assumption of second order intensity-reweighted stationarity. As we will indicate in this paper, this structure may be extended to -functions for general marked point processes.

Regarding -functions, in [16] the author noted that the ideas in that paper could be combined with those in [15] to define inhomogenous -functions with respect to mark sets. In this paper we do so, and, as a by-product, obtain a generalisation of the cross nearest-neighbour distance distribution function.

The paper is structured as follows. In Section 2, we define marked point processes with locations in Euclidean spaces and give the necessary preliminaries. In Sections 3.1 and 3.2, we define, respectively, cross - and -functions for inhomogeneous multivariate point processes and propose generalisations to point processes with real-valued marks. We show that and can be expressed in terms of the generating functional and discuss the relationships between these statistics and the cross -function. In Section 4, we investigate the form of our statistics under various independence and marking assumptions. We derive minus sampling estimators in Section 5, which are applied to a data set on wildfires in New Brunswick, Canada, in Section 6. We finish the paper with a summary.

2 Definitions and notations

Throughout this paper, we consider marked point processes [6, Definition 6.4.1] with points in equipped with the Euclidean metric and Borel -algebra . We write for the Lebesgue measure on . By definition, the ground process obtained from by ignoring the marks is a well-defined point process on in its own right. We shall assume that is simple, that is, almost surely does not contain multiple points.

We assume that the mark space is Polish and equipped with a finite reference measure on the Borel -algebra . We denote by the Borel -algebra on the product space . In the special case that is finite, can be seen as a multivariate point process where contains the points marked .

2.1 Product densities

Recall that the intensity measure of a marked point process is defined on product sets by

the expected number of points in with marks in . If is locally finite as a set-function, it can be extended to a measure on (see e.g. [12, Theorem A, p. 54]). In this paper, additionally, we assume that admits a density with respect to , which is referred to as the intensity function. In particular, for a finite mark space, is the intensity function of .

Since for fixed the measure is absolutely continuous with respect to the intensity measure of the ground process,

(1)

Here is the probability that the mark of a point at location falls in . The members of the family of probability distributions on the Borel sets of are called mark distributions.

If is stationary, that is, if its distribution is invariant under translations of the locations, for some probability distribution on , which is known as the mark distribution. In this case, we may take for the reference measure on so that has constant intensity function with respect to , and, moreover, is the intensity of the ground process.

Higher order ‘intensity functions’ or product densities can be defined as densities of the factorial moment measures provided these exist, in which case they satisfy the following -th order Campbell formula. For any measurable function , the sum of over -tuples of different points of is a random variable with expectation

(2)

(with the left hand side being infinite if and only if the right hand side is infinite). Note that , the intensity function. Also, -point mark distributions can be defined analogously to the case . For further details, see for example the textbook [4]. Note that, by the absolute continuity underlying the existence of , there exist product densities for the ground process and densities of with respect to the -fold product of with itself such that

In particular, the intensity function of the ground process is given by and .

We will also need the related concept of -point correlation functions , , the intensity-reweighted densities of the factorial cumulant measures [7, Section 9.5]. These permutation invariant measurable functions are defined by the following recursive relation (see e.g. [15, 26]). Set and, for ,

(3)

where is a sum over all possible -sized partitions , , of the set and denotes the cardinality of . Note that for a Poisson process, for all .

2.2 Palm measures and conditional intensities

Let be a simple marked point process whose intensity function exists. The summary statistics in this paper are defined in terms of reduced Palm measures satisfying the reduced Campbell-Mecke formula which states that, for any measurable function ,

(4)

(with the left hand side being infinite if and only if the right hand side is infinite). The probability measure corresponding to can be interpreted as the conditional probability of given that . For further details see [7].

A few remarks are in order. First, consider the special case that is stationary and the reference measure on is the mark distribution . In this case, it is possible to define reduced Palm measures with respect to arbitrary mark sets. Specifically, for such that , set

(5)

Then, does not depend on the choice of and is a probability measure [4, Section 4.4.8]. It can be interpreted as the conditional distribution of on the complement of , given that places a point at with mark in .

As a second example, consider multivariate point processes and let be any finite measure on . Now, we have a family of reduced Palm measures for and we will restrict ourselves to sets of the form . Then (5) reads

and does not depend on the specific choice of .

For non-finite mark spaces, the reference measure on may not correspond to a well-defined mark distribution. One pragmatic approach is to take a finite partition of the mark space, , and proceed as in the multivariate case. An alternative is to use (5) as definition for a -averaged reduced Palm distribution with respect to , bearing in mind that the definition does depend on the choice of .

2.3 Generating functionals

When product densities of all orders exist, the generating functional , which uniquely determines the distribution of (see e.g. [7, Thm 9.4.V.]), is defined as follows. For all mappings such that is measurable with bounded support, set

(6)

By convention, and an empty product equals . The last equalities holds provided that the right hand sides converge (see e.g. [4, p. 126]). Similarly, for and , we may define the generating functional with respect to (see the discussion around (5)) by

(7)

3 Definition of summary statistics

3.1 Inhomogeneous cross -function

In this section, we define cross -functions for marked point processes in analogy with the inhomogeneous nearest neighbour distance distribution function of [16]. Write

Throughout we assume that is a simple marked point process whose product densities of all orders exist and for which the , , are translation invariant in the sense that

for all and -almost all . If, moreover, , then is said to be intensity-reweighted moment stationary (IRMS).

Definition 1.

Let be IRMS and let and be Borel sets in with and strictly positive. Write for the closed ball centred at with radius . Set

and define, for , the inhomogeneous cross nearest neighbour distance distribution function by

We shall show in Theorem 1 below that the specific choice in (1) is merely a matter of convenience. Moreover, may be replaced by smaller strictly positive scalars.

When is stationary and , the mark distribution,

so that (1) reduces to the -to- nearest neighbour distance distribution for marked point processes [15].

3.1.1 Multivariate point process

Consider a multivariate point process that is intensity-reweighted moment stationary. Let and for . Write and note that is equal to . Therefore (1) reduces to

(9)

which under the further assumption that is stationary is equal to

the classical cross nearest neighbour distance distribution function, see e.g. [10, Chapter 21]. If is a Poisson process,

Smaller values of suggest there are fewer points of type in the -neighbourhood, that is, inhibition; larger values indicate that points of type are attracted by those of type at range . In the case , we obtain the inhomogeneous -function of .

With for some and , (1) is equal to

(10)

for . Note that the function may depend on through . If we give equal weight to each member of , however, is uniquely defined in terms of the intensity functions of the components of and the minimal marginal intensity . If is stationary, is the classic -to-any nearest neighbour distance distribution.

3.2 Inhomogeneous cross -functions

In this section, we define cross -functions for marked point processes in analogy with the inhomogeneous -function of [16]. Throughout we assume that is a simple intensity-reweighted moment stationary point process.

Definition 2.

Let be IRMS and let and be Borel sets in with and strictly positive. For and , set

and define the inhomogeneous cross -function by

(11)

for all ranges for which the series is absolutely convergent.

Note that there is an implicit dependence on in and consequently in . However, the IRMS assumption implies that all (and therefore ) are -almost everywhere constant. Furthermore, Cauchy’s root test implies that whenever (11) is absolutely convergent.

When is stationary and , the mark distribution, (11) reduces to the cross inhomogeneous -function for marked point processes introduced in [15] since in that case regardless of the choice of . Finally, note that for a Poisson process, for , so . In general, the inhomogeneous -function is not commutative with respect to the mark sets and , .

Looking closer at Definition 2, we see that there is some resemblance between and the cross inhomogeneous -function defined in [21, Def. 4.8]. Indeed, truncation of the series in (11) at gives

where

(12)

is the generalisation of the cross inhomogeneous -function to our set-up. Note that the inhomogeneous -function defined by (12) requires translation invariance of the two-point correlation function only, in which case is said to be second-order intensity reweighted stationary (SOIRS). Heuristically, suggests that points with marks in tend to cluster around points with marks in at range ; indicates that points with marks in avoid those with marks in at range . This interpretation is confirmed by Theorem 1 below.

Definition 2 is hard to work with. A more natural representation can be given in terms of the generating functional. In order to do so, define the inhomogeneous empty space function of , the marked point process restricted to , by

(13)

under the convention that empty products equal 1 and with as in Definition 1. As for , the definition does not depend on the choice of origin and may be replaced by smaller strictly positive scalars. At this point, it is important to stress that for , is not necessarily equal to , the empty space function of the ground process , since depends on the marks both through the intensity function and the bound .

Theorem 1.

Let be as in Definition 2. Then, as a function of , each is -almost everywhere constant. Moreover, if

is strictly less than , then, for almost all , the -to- inhomogeneous -function of Definition 2 satisfies

for all for which .

The proof is technical and relegated to Appendix A.

3.2.1 Multivariate point process

Consider a multivariate point process that is intensity-reweighted moment stationary. By a suitable choice of mark set , we obtain different types of inhomogeneous -functions.

First, take and for . Then, writing for the inhomogeneous empty space function of and recalling (9), the statistic (11) is equal to

(14)

and compares the distribution of intensity-reweighted distances from a point of type to the nearest one of type to those from an arbitrary point to . Therefore, it generalises the -to- cross -function of [18] for stationary multivariate point processes.

Set for some and . Then, recalling (10), the statistic (11) can be written as

(15)

and compares tails of the -to-any nearest neighbour distance distribution and the empty space function of . Note that if is proportional to the counting measure, can be expressed in terms of the intensity functions of the components and the minimal marginal intensity (see the discussion following formula (10)). Hence, generalises the -to-any -function for stationary multivariate point processes [18].

4 Independence and random labelling

In this section, we investigate the effect of various independence assumptions and marking schemes on our summary statistics.

4.1 Independent marking mechanisms

Specific forms of marking are summarised in Definition 3 below [6, Definition 6.4III].

Definition 3.

A marked point process is called independently marked if, given the ground process , the marks are independent random variables with a distribution that depends only on the corresponding location. If, additionally, does not depend on the location, we say that has the random labelling property.

Proposition 1.

Let and be Borel sets in with and assume that is independently marked.

  • If is SOIRS, the ground process is also SOIRS and , the inhomogeneous -function of .

Let denote the expectation under the Palm distribution of the ground process and write , , . Under the assumptions of Theorem 1, when is independently marked, is IRMS and

  • ,

  • ,

  • for all for which the denominator is non-zero.

If is randomly labelled with , then and

Proof.

Recall that

Under the independent marking assumption,

(16)

Therefore,

the -point correlation function of the ground process, so that is (second order) intensity-reweighted moment stationary whenever is. Plugging (16) into (12) yields , the inhomogeneous -function of . Furthermore, under the assumption that the series expansion is absolutely convergent, by (6), (13) reduces to

Similarly,

We conclude that and .

Under random labelling, the right hand side of (16) is further simplified to for some probability density that does not depend on location. If furthermore , the mark distribution, the density is one, i.e. . Hence and . In particular . ∎

Note that the summary statistics do not depend on the choice of , but may depend on through .

4.2 Independence

Recall that we use the notation , , for the restriction of to . If and are independent, then the -to- cross -function is identically . More precisely, the following result holds.

Proposition 2.

Consider two disjoint Borel sets with and strictly positive and assume that and are independent. Under the assumptions of Theorem 1,

so that whenever well defined.

If is SOIRS, whenever and are independent by [21, Proposition 4.4].

Proof.

By the Campbell formula (2), if and are independent, the product densities factorise with respect to and , i.e.

for almost all and . Then, by the proof of Theorem 1,

The integrand factorises as

so that We conclude that and . ∎

Proposition 2 generalises well-known results for stationary multivariate point processes [8, 18]. The next result collects mixture formulae.

Proposition 3.

Let be a Borel set with . Set and assume that and are independent.

  • If is SOIRS, , where is the inhomogeneous -function of .

Write for . Under the assumptions of Theorem 1,

  • ,

  • ,

  • for all for which the denominator is non-zero.

Note here that if we would have used the global infimum in (1), (13) and Definition 2, the constants and would vanish and, e.g., whenever defined.

Proof.

As in the proof of Proposition 2, if and , so that