Summary statistics for inhomogeneous marked point processes
[0.2in]
O. Cronie and M.N.M. van Lieshout
[0.1in]
CWI, P.O. Box 94079, NL1090 GB Amsterdam, The Netherlands
Abstract: We propose new summary statistics for intensityreweighted moment stationary marked point processes with particular emphasis on discrete marks. The new statistics are based on the point correlation functions and reduce to cross  and –functions when stationarity holds. We explore the relationships between the various functions and discuss their explicit forms under specific model assumptions. We derive ratiounbiased minus sampling estimators for our statistics and illustrate their use on a data set of wildfires.
Key words: Generating functional, Intensityreweighted moment stationarity, function, Marked point process, Multivariate point process, Nearest neighbour distance distance distribution function, point correlation function, Reduced Palm measure.
Mathematics Subject Classification: 60G55, 60D05.
1 Introduction
The analysis of a marked point pattern typically begins with computing some summary statistics which may be used to find specific structures in the data and suggest suitable models [4, 6, 7, 10, 13, 14]. The choice of summary characteristic depends both on the pattern at hand and on the feature or hypothesis of interest. Indeed, under the working assumption of stationarity, for discrete marks, cross versions of the  or nearest neighbour distance distribution function may be appropriate [8]; for realvalued marks, the mark correlation functions of [22] are widely used. Various types of functions [15, 18] offer useful alternatives.
Often, however, the assumption of homogeneity cannot be justified. In the unmarked case, [1] proposed an inhomogeneous extension of the function for socalled second order intensityreweighted stationary point processes. Their ideas were extended to spatiotemporal point processes in [9, 20], whereas [5, 16] extended the function under the somewhat stronger assumption of intensityreweighted moment stationarity in space and time.
For nonstationary multivariate point processes, [21] proposed an extension of the function under the assumption of second order intensityreweighted stationarity. As we will indicate in this paper, this structure may be extended to functions for general marked point processes.
Regarding functions, in [16] the author noted that the ideas in that paper could be combined with those in [15] to define inhomogenous functions with respect to mark sets. In this paper we do so, and, as a byproduct, obtain a generalisation of the cross nearestneighbour distance distribution function.
The paper is structured as follows. In Section 2, we define marked point processes with locations in Euclidean spaces and give the necessary preliminaries. In Sections 3.1 and 3.2, we define, respectively, cross  and functions for inhomogeneous multivariate point processes and propose generalisations to point processes with realvalued marks. We show that and can be expressed in terms of the generating functional and discuss the relationships between these statistics and the cross function. In Section 4, we investigate the form of our statistics under various independence and marking assumptions. We derive minus sampling estimators in Section 5, which are applied to a data set on wildfires in New Brunswick, Canada, in Section 6. We finish the paper with a summary.
2 Definitions and notations
Throughout this paper, we consider marked point processes [6, Definition 6.4.1] with points in equipped with the Euclidean metric and Borel algebra . We write for the Lebesgue measure on . By definition, the ground process obtained from by ignoring the marks is a welldefined point process on in its own right. We shall assume that is simple, that is, almost surely does not contain multiple points.
We assume that the mark space is Polish and equipped with a finite reference measure on the Borel algebra . We denote by the Borel algebra on the product space . In the special case that is finite, can be seen as a multivariate point process where contains the points marked .
2.1 Product densities
Recall that the intensity measure of a marked point process is defined on product sets by
the expected number of points in with marks in . If is locally finite as a setfunction, it can be extended to a measure on (see e.g. [12, Theorem A, p. 54]). In this paper, additionally, we assume that admits a density with respect to , which is referred to as the intensity function. In particular, for a finite mark space, is the intensity function of .
Since for fixed the measure is absolutely continuous with respect to the intensity measure of the ground process,
(1) 
Here is the probability that the mark of a point at location falls in . The members of the family of probability distributions on the Borel sets of are called mark distributions.
If is stationary, that is, if its distribution is invariant under translations of the locations, for some probability distribution on , which is known as the mark distribution. In this case, we may take for the reference measure on so that has constant intensity function with respect to , and, moreover, is the intensity of the ground process.
Higher order ‘intensity functions’ or product densities can be defined as densities of the factorial moment measures provided these exist, in which case they satisfy the following th order Campbell formula. For any measurable function , the sum of over tuples of different points of is a random variable with expectation
(2)  
(with the left hand side being infinite if and only if the right hand side is infinite). Note that , the intensity function. Also, point mark distributions can be defined analogously to the case . For further details, see for example the textbook [4]. Note that, by the absolute continuity underlying the existence of , there exist product densities for the ground process and densities of with respect to the fold product of with itself such that
In particular, the intensity function of the ground process is given by and .
We will also need the related concept of point correlation functions , , the intensityreweighted densities of the factorial cumulant measures [7, Section 9.5]. These permutation invariant measurable functions are defined by the following recursive relation (see e.g. [15, 26]). Set and, for ,
(3) 
where is a sum over all possible sized partitions , , of the set and denotes the cardinality of . Note that for a Poisson process, for all .
2.2 Palm measures and conditional intensities
Let be a simple marked point process whose intensity function exists. The summary statistics in this paper are defined in terms of reduced Palm measures satisfying the reduced CampbellMecke formula which states that, for any measurable function ,
(4) 
(with the left hand side being infinite if and only if the right hand side is infinite). The probability measure corresponding to can be interpreted as the conditional probability of given that . For further details see [7].
A few remarks are in order. First, consider the special case that is stationary and the reference measure on is the mark distribution . In this case, it is possible to define reduced Palm measures with respect to arbitrary mark sets. Specifically, for such that , set
(5) 
Then, does not depend on the choice of and is a probability measure [4, Section 4.4.8]. It can be interpreted as the conditional distribution of on the complement of , given that places a point at with mark in .
As a second example, consider multivariate point processes and let be any finite measure on . Now, we have a family of reduced Palm measures for and we will restrict ourselves to sets of the form . Then (5) reads
and does not depend on the specific choice of .
For nonfinite mark spaces, the reference measure on may not correspond to a welldefined mark distribution. One pragmatic approach is to take a finite partition of the mark space, , and proceed as in the multivariate case. An alternative is to use (5) as definition for a averaged reduced Palm distribution with respect to , bearing in mind that the definition does depend on the choice of .
2.3 Generating functionals
When product densities of all orders exist, the generating functional , which uniquely determines the distribution of (see e.g. [7, Thm 9.4.V.]), is defined as follows. For all mappings such that is measurable with bounded support, set
(6)  
By convention, and an empty product equals . The last equalities holds provided that the right hand sides converge (see e.g. [4, p. 126]). Similarly, for and , we may define the generating functional with respect to (see the discussion around (5)) by
(7) 
3 Definition of summary statistics
3.1 Inhomogeneous cross function
In this section, we define cross functions for marked point processes in analogy with the inhomogeneous nearest neighbour distance distribution function of [16]. Write
Throughout we assume that is a simple marked point process whose product densities of all orders exist and for which the , , are translation invariant in the sense that
for all and almost all . If, moreover, , then is said to be intensityreweighted moment stationary (IRMS).
Definition 1.
Let be IRMS and let and be Borel sets in with and strictly positive. Write for the closed ball centred at with radius . Set
and define, for , the inhomogeneous cross nearest neighbour distance distribution function by
We shall show in Theorem 1 below that the specific choice in (1) is merely a matter of convenience. Moreover, may be replaced by smaller strictly positive scalars.
When is stationary and , the mark distribution,
so that (1) reduces to the to nearest neighbour distance distribution for marked point processes [15].
3.1.1 Multivariate point process
Consider a multivariate point process that is intensityreweighted moment stationary. Let and for . Write and note that is equal to . Therefore (1) reduces to
(9) 
which under the further assumption that is stationary is equal to
the classical cross nearest neighbour distance distribution function, see e.g. [10, Chapter 21]. If is a Poisson process,
Smaller values of suggest there are fewer points of type in the neighbourhood, that is, inhibition; larger values indicate that points of type are attracted by those of type at range . In the case , we obtain the inhomogeneous function of .
With for some and , (1) is equal to
(10) 
for . Note that the function may depend on through . If we give equal weight to each member of , however, is uniquely defined in terms of the intensity functions of the components of and the minimal marginal intensity . If is stationary, is the classic toany nearest neighbour distance distribution.
3.2 Inhomogeneous cross functions
In this section, we define cross functions for marked point processes in analogy with the inhomogeneous function of [16]. Throughout we assume that is a simple intensityreweighted moment stationary point process.
Definition 2.
Let be IRMS and let and be Borel sets in with and strictly positive. For and , set
and define the inhomogeneous cross function by
(11) 
for all ranges for which the series is absolutely convergent.
Note that there is an implicit dependence on in and consequently in . However, the IRMS assumption implies that all (and therefore ) are almost everywhere constant. Furthermore, Cauchy’s root test implies that whenever (11) is absolutely convergent.
When is stationary and , the mark distribution, (11) reduces to the cross inhomogeneous function for marked point processes introduced in [15] since in that case regardless of the choice of . Finally, note that for a Poisson process, for , so . In general, the inhomogeneous function is not commutative with respect to the mark sets and , .
Looking closer at Definition 2, we see that there is some resemblance between and the cross inhomogeneous function defined in [21, Def. 4.8]. Indeed, truncation of the series in (11) at gives
where
(12) 
is the generalisation of the cross inhomogeneous function to our setup. Note that the inhomogeneous function defined by (12) requires translation invariance of the twopoint correlation function only, in which case is said to be secondorder intensity reweighted stationary (SOIRS). Heuristically, suggests that points with marks in tend to cluster around points with marks in at range ; indicates that points with marks in avoid those with marks in at range . This interpretation is confirmed by Theorem 1 below.
Definition 2 is hard to work with. A more natural representation can be given in terms of the generating functional. In order to do so, define the inhomogeneous empty space function of , the marked point process restricted to , by
(13) 
under the convention that empty products equal 1 and with as in Definition 1. As for , the definition does not depend on the choice of origin and may be replaced by smaller strictly positive scalars. At this point, it is important to stress that for , is not necessarily equal to , the empty space function of the ground process , since depends on the marks both through the intensity function and the bound .
Theorem 1.
The proof is technical and relegated to Appendix A.
3.2.1 Multivariate point process
Consider a multivariate point process that is intensityreweighted moment stationary. By a suitable choice of mark set , we obtain different types of inhomogeneous functions.
First, take and for . Then, writing for the inhomogeneous empty space function of and recalling (9), the statistic (11) is equal to
(14) 
and compares the distribution of intensityreweighted distances from a point of type to the nearest one of type to those from an arbitrary point to . Therefore, it generalises the to cross function of [18] for stationary multivariate point processes.
Set for some and . Then, recalling (10), the statistic (11) can be written as
(15) 
and compares tails of the toany nearest neighbour distance distribution and the empty space function of . Note that if is proportional to the counting measure, can be expressed in terms of the intensity functions of the components and the minimal marginal intensity (see the discussion following formula (10)). Hence, generalises the toany function for stationary multivariate point processes [18].
4 Independence and random labelling
In this section, we investigate the effect of various independence assumptions and marking schemes on our summary statistics.
4.1 Independent marking mechanisms
Definition 3.
A marked point process is called independently marked if, given the ground process , the marks are independent random variables with a distribution that depends only on the corresponding location. If, additionally, does not depend on the location, we say that has the random labelling property.
Proposition 1.
Let and be Borel sets in with and assume that is independently marked.

If is SOIRS, the ground process is also SOIRS and , the inhomogeneous function of .
Let denote the expectation under the Palm distribution of the ground process and write , , . Under the assumptions of Theorem 1, when is independently marked, is IRMS and

,

,

for all for which the denominator is nonzero.
If is randomly labelled with , then and
Proof.
Recall that
Under the independent marking assumption,
(16) 
Therefore,
the point correlation function of the ground process, so that is (second order) intensityreweighted moment stationary whenever is. Plugging (16) into (12) yields , the inhomogeneous function of . Furthermore, under the assumption that the series expansion is absolutely convergent, by (6), (13) reduces to
Similarly,
We conclude that and .
Under random labelling, the right hand side of (16) is further simplified to for some probability density that does not depend on location. If furthermore , the mark distribution, the density is one, i.e. . Hence and . In particular . ∎
Note that the summary statistics do not depend on the choice of , but may depend on through .
4.2 Independence
Recall that we use the notation , , for the restriction of to . If and are independent, then the to cross function is identically . More precisely, the following result holds.
Proposition 2.
Consider two disjoint Borel sets with and strictly positive and assume that and are independent. Under the assumptions of Theorem 1,
so that whenever well defined.
If is SOIRS, whenever and are independent by [21, Proposition 4.4].
Proof.
Proposition 2 generalises wellknown results for stationary multivariate point processes [8, 18]. The next result collects mixture formulae.
Proposition 3.
Let be a Borel set with . Set and assume that and are independent.

If is SOIRS, , where is the inhomogeneous function of .
Write for . Under the assumptions of Theorem 1,

,

,

for all for which the denominator is nonzero.
Note here that if we would have used the global infimum in (1), (13) and Definition 2, the constants and would vanish and, e.g., whenever defined.
Proof.
As in the proof of Proposition 2, if and , so that