Spatial Point Processes and Their Applications in Epidemiology

Spatial Statistics for Epidemiology —Spatial Point Processes

By Liu Xu U086105E

Supervisor: Prof Loh Wei Liem

Department of Statistics and Applied Probability

National University of Singapore

15 March 2012

1

Outline

Application in epidemiology

Theory: descriptive statistics

Models: spatial point processes

Data: spatial point patterns

2

A spatial point pattern is …

Space → Rd, d ≥ 2

Points → data values

Pattern → arrangement

Intro Models Theory Application 3

Example: tropical rainforest


Example: cancer cases

Intro Models Theory Application5

marks: extra informationattached to points, categorical/ continuous

Example : the milky way galaxy


Types of point patterns


regularityCSRclustering repulsionattraction

Aim: describe and model “pattern”

Are points randomly located?

• If so, find a statistical model to describe the “randomness”;

• If not, …



• A spatial point process is a stochastic process X whichgenerates a countable set of events in defined space.

• A spatial pattern x = {x1, x2, …, xn} on an observational regionW generated from a spatial point process is a realization ofthe process.

• Only consider point processes in 2-D space.

• The locations of any object can be modelled−plants, animals,cells, stars, disease cases, earthquakes, …




• Notation:

W: study region in R2

N(A): number of events inside subregion A, A W.

|A|: area of region A

s: random locations in W

ds: infinitesimal region centered at s

• Assumptions on spatial point processes:

i. Locally finite: the number of events in any bounded region is bounded

ii. At any point location s, there is either one event or no events at all

HPP

A spatial point process in a bounded region W in R2 is ahomogeneous Poisson process (HPP) if:

i. For all subregion A in W, N(A) ~ Poi(λ|A|), where 0 < λ < ∞is a constant, called intensity (homogeneity).

ii. If A1 and A2 are two disjoint subregions in W, then N(A1) andN(A2) are independent (independence).

• Standard model for complete spatial randomness (CSR);

• Can be generalized to more complicated models;

• A reference process when analyzing spatial characteristics ofa specific pattern.


IPP

A spatial point process in a bounded region W in R2 is ainhomogeneous Poisson process (IPP) if:

i. For any subregion A in W, N(A) ~ Poi(∫Aλ(s)ds), where 0 <λ(s) < ∞ is the intensity at s .

ii. If A1 and A2 are two disjoint subregions in W, then N(A1) andN(A2) are independent (independence).


HPPIPPtiongeneraliza

casespecial

Simulation from Poisson processes


Two Poisson process realizations on the unit square having the sameexpected number of events = 100.

Summary statistics: first-order

First-order intensity of a spatial point process is:

• Interpretation: expected number of events per unit area. Forsmall region ds, λ(s)|ds| describes the probability for an eventin ds.

• Intensity may be constant (homogeneous) or may vary fromlocation to location (inhomogeneous). If the process ishomogeneous, estimate intensity by


s

ss

s d

dNE

d

))((lim)(

0

W

WN )(ˆ

Estimate λ(s) in inhomogeneous case• Estimating the intensity of a spatial point pattern is similar to estimating a

bivariate probability density

• How to estimate bivariate density?

Given an i.i.d. sample (y1, . . . , yn) of a bivariate random variable Y, an estimate of the density f (·) of Y at y is

where K(·) is the kernel and h is the bandwidth.

• The expression for kernel smoothing of the intensity function of a pointpattern x = {x1, …, xn} at location s is

the bandwidth h is chosen based on some cross-validation criterion.


n

i

)h

K(nh

)(f1

2

1ˆ iyy

y

n

i

)h

K(h

)(λ1

2

1ˆ sxs i

Kernel smoothed intensity of IPP


Kernel estimated intensity for the point pattern simulated from HPP with λ(s) = 400xy on [0, 1] * [0, 1].

Summary statistics: second-orderThe second-order properties of a point process involverelationship between number of events at different locations.

• The second-order intensity of a spatial point process is

• A point process is called stationary if

• A stationary point process is isotropic if


ji

ji

ssji

ss

ssss

ji dd

dNdNE

dd

)]()([lim),(

0,2

)(),(

,)(

22 jiji ssss

ss

)(),( 22 jiji ssss

If a point process is stationary and isotropic, the K-function ofthe process is defined by:

λK(r) = E[number of further events within distance r from an arbitrary event]

Two properties of K-function:

• For a HPP, λK(r) = λπr2 , thus Kp(r) = πr2

• K(r) is invariant to random thinning.Intro Models Theory Application

18

K-function

Def. random thinning: each event of a point process X is eitherretained or deleted with retention probability p, independently ofother events. The resulting point process X’ contains a subset ofevents of the original process X.

Comparing estimated K-functions of simulated point patterns


CSR: K(r) = πr2

clustered: K(r) > πr2

regular: K(r) < πr2

Estimation of K(r):E(# further events…)/λ


negatively biased edge correction


John Snow (15 March 1813 – 16 June 1858) is considered to be one of the fathers of epidemiology, because of his work in tracing the source of a cholera outbreak in Soho, England, in 1854


Case-control study

Goal: compare the spatial distributionof disease cases with the underlyingpopulation

• Null hypothesis :

equal spatial distribution

• Controls:

selected to represent populationheterogeneity


Incidence of disease

Population density

Overall risk of

disease

Other risk factors, e.g.

distance from point source

Do disease cases occur randomly among population?

Case-control data consist of two point patterns:

• the locations of n1 cases of particular disease {x1, x2, …, xn1}

• the locations n0 controls {xn1+1, …, xn1+n0}

in a study region W over a defined period of time. Total number of data points n = n1 + n0.

Assumption:

• Cases from an IPP with intensity λ1(s)

• Controls from another independent IPP with intensity λ0(s)


Spatial risk

relative risk:

estimated relative risk:

H0:

test statistic:

estimated test statistic:

significance: Monte Carlo test


)(

)()(

0

1

s

ss

0

10)(

n

n s

n

i

T1

2

0 ])([ ix

n

i

T1

2

0 ])(ˆ[ˆ ix

)(ˆ

)(ˆ)(ˆ

0

1

s

ss

Spatial clusteringK0(r)→ amount of clustering due to population

K1(r)→ amount of clustering due to population plus effect of other possible risk factors

D(r) = K1(r) - K0(r) → the amount of clustering that is not due to population

estimate:

H0:

Test statistic:

significance: Monte Carlo test


m

k k

k

rD

rDD

1 )](var[

)(

)(ˆ)(ˆ)(ˆ 01 rKrKrD

0D(r)=

Monte Carlo test1). simulation with random labelling at jth iteration, j=1, 2, …, 99• randomly select n1 points from n data points and label the selected points as “case”, label

the remaining n0 points as “control”

• with the relabelled data, estimate kernel smoother and at every data point.

• estimate K1j(r) and K0j(r) and compute Dj(r) at a set of discrete distances {r1, r2, …, rm} .

2). test statistic• for each j, compute

• compute the variance of D(rk) for each k=1, 2, …, m. then get

3). p-value


)(ˆ),(ˆ01 xx jj )(ˆ xj

2

1 0])(ˆ[ˆ

n

i ijjT x

m

kk

kj

j

rD

rDD

1 )](ˆvar[

)(ˆˆ

)199/(]1}ˆˆ{[

)199/(]1}ˆˆ{[

99

1

2

99

1

1

j

j

j

j

DDIp

TTIp

Case study-the chorley data


58 cases 978 controls

Lots of graphs



Monte Carlo test gives p-value = 0.64 →there is no significant spatial variation in the relative risk.

graph


p-value = 0.91→ no significant relative spatial clustering

Summary

summary 31

Spatial point patterns

Spatial point processes

HPP

IPP

λ(s)

K(r)

CSR


Thank you for your attention!

Time for Q&A

The end 32

Documents

Spatial Point Processes and Their Applications in Epidemiology