Upload
lilac-liu-xu
View
63
Download
0
Embed Size (px)
Citation preview
Spatial Statistics for Epidemiology —Spatial Point Processes
By Liu Xu U086105E
Supervisor: Prof Loh Wei Liem
Department of Statistics and Applied Probability
National University of Singapore
15 March 2012
1
Outline
Application in epidemiology
Theory: descriptive statistics
Models: spatial point processes
Data: spatial point patterns
2
A spatial point pattern is …
Space → Rd, d ≥ 2
Points → data values
Pattern → arrangement
Intro Models Theory Application 3
Example: tropical rainforest
Intro Models Theory Application 4
Example: cancer cases
Intro Models Theory Application5
marks: extra informationattached to points, categorical/ continuous
Example : the milky way galaxy
Intro Models Theory Application 6
Types of point patterns
Intro Models Theory Application7
regularityCSRclustering repulsionattraction
Aim: describe and model “pattern”
Are points randomly located?
• If so, find a statistical model to describe the “randomness”;
• If not, …
Intro Models Theory Application8
Models: spatial point processes
• A spatial point process is a stochastic process X whichgenerates a countable set of events in defined space.
• A spatial pattern x = {x1, x2, …, xn} on an observational regionW generated from a spatial point process is a realization ofthe process.
• Only consider point processes in 2-D space.
• The locations of any object can be modelled−plants, animals,cells, stars, disease cases, earthquakes, …
Intro Models Theory Application9
Models: spatial point processes
Intro Models Theory Application10
• Notation:
W: study region in R2
N(A): number of events inside subregion A, A W.
|A|: area of region A
s: random locations in W
ds: infinitesimal region centered at s
• Assumptions on spatial point processes:
i. Locally finite: the number of events in any bounded region is bounded
ii. At any point location s, there is either one event or no events at all
HPP
A spatial point process in a bounded region W in R2 is ahomogeneous Poisson process (HPP) if:
i. For all subregion A in W, N(A) ~ Poi(λ|A|), where 0 < λ < ∞is a constant, called intensity (homogeneity).
ii. If A1 and A2 are two disjoint subregions in W, then N(A1) andN(A2) are independent (independence).
• Standard model for complete spatial randomness (CSR);
• Can be generalized to more complicated models;
• A reference process when analyzing spatial characteristics ofa specific pattern.
Intro Models Theory Application11
IPP
A spatial point process in a bounded region W in R2 is ainhomogeneous Poisson process (IPP) if:
i. For any subregion A in W, N(A) ~ Poi(∫Aλ(s)ds), where 0 <λ(s) < ∞ is the intensity at s .
ii. If A1 and A2 are two disjoint subregions in W, then N(A1) andN(A2) are independent (independence).
Intro Models Theory Application12
HPPIPPtiongeneraliza
casespecial
Simulation from Poisson processes
Intro Models Theory Application13
Two Poisson process realizations on the unit square having the sameexpected number of events = 100.
Summary statistics: first-order
First-order intensity of a spatial point process is:
• Interpretation: expected number of events per unit area. Forsmall region ds, λ(s)|ds| describes the probability for an eventin ds.
• Intensity may be constant (homogeneous) or may vary fromlocation to location (inhomogeneous). If the process ishomogeneous, estimate intensity by
Intro Models Theory Application 14
s
ss
s d
dNE
d
))((lim)(
0
W
WN )(ˆ
Estimate λ(s) in inhomogeneous case• Estimating the intensity of a spatial point pattern is similar to estimating a
bivariate probability density
• How to estimate bivariate density?
Given an i.i.d. sample (y1, . . . , yn) of a bivariate random variable Y, an estimate of the density f (·) of Y at y is
where K(·) is the kernel and h is the bandwidth.
• The expression for kernel smoothing of the intensity function of a pointpattern x = {x1, …, xn} at location s is
the bandwidth h is chosen based on some cross-validation criterion.
Intro Models Theory Application15
n
i
)h
K(nh
)(f1
2
1ˆ iyy
y
n
i
)h
K(h
)(λ1
2
1ˆ sxs i
Kernel smoothed intensity of IPP
Intro Models Theory Application16
Kernel estimated intensity for the point pattern simulated from HPP with λ(s) = 400xy on [0, 1] * [0, 1].
Summary statistics: second-orderThe second-order properties of a point process involverelationship between number of events at different locations.
• The second-order intensity of a spatial point process is
• A point process is called stationary if
• A stationary point process is isotropic if
Intro Models Theory Application 17
ji
ji
ssji
ss
ssss
ji dd
dNdNE
dd
)]()([lim),(
0,2
)(),(
,)(
22 jiji ssss
ss
)(),( 22 jiji ssss
If a point process is stationary and isotropic, the K-function ofthe process is defined by:
λK(r) = E[number of further events within distance r from an arbitrary event]
Two properties of K-function:
• For a HPP, λK(r) = λπr2 , thus Kp(r) = πr2
• K(r) is invariant to random thinning.Intro Models Theory Application
18
K-function
Def. random thinning: each event of a point process X is eitherretained or deleted with retention probability p, independently ofother events. The resulting point process X’ contains a subset ofevents of the original process X.
Comparing estimated K-functions of simulated point patterns
Intro Models Theory Application 19
CSR: K(r) = πr2
clustered: K(r) > πr2
regular: K(r) < πr2
Estimation of K(r):E(# further events…)/λ
Intro Models Theory Application20
negatively biased edge correction
Application in epidemiology
John Snow (15 March 1813 – 16 June 1858) is considered to be one of the fathers of epidemiology, because of his work in tracing the source of a cholera outbreak in Soho, England, in 1854
Intro Models Theory Application21
Case-control study
Goal: compare the spatial distributionof disease cases with the underlyingpopulation
• Null hypothesis :
equal spatial distribution
• Controls:
selected to represent populationheterogeneity
Intro Models Theory Application22
Incidence of disease
Population density
Overall risk of
disease
Other risk factors, e.g.
distance from point source
Do disease cases occur randomly among population?
Case-control data consist of two point patterns:
• the locations of n1 cases of particular disease {x1, x2, …, xn1}
• the locations n0 controls {xn1+1, …, xn1+n0}
in a study region W over a defined period of time. Total number of data points n = n1 + n0.
Assumption:
• Cases from an IPP with intensity λ1(s)
• Controls from another independent IPP with intensity λ0(s)
Intro Models Theory Application23
Spatial risk
relative risk:
estimated relative risk:
H0:
test statistic:
estimated test statistic:
significance: Monte Carlo test
Intro Models Theory Application24
)(
)()(
0
1
s
ss
0
10)(
n
n s
n
i
T1
2
0 ])([ ix
n
i
T1
2
0 ])(ˆ[ˆ ix
)(ˆ
)(ˆ)(ˆ
0
1
s
ss
Spatial clusteringK0(r)→ amount of clustering due to population
K1(r)→ amount of clustering due to population plus effect of other possible risk factors
D(r) = K1(r) - K0(r) → the amount of clustering that is not due to population
estimate:
H0:
Test statistic:
significance: Monte Carlo test
Intro Models Theory Application25
m
k k
k
rD
rDD
1 )](var[
)(
)(ˆ)(ˆ)(ˆ 01 rKrKrD
0D(r)=
Monte Carlo test1). simulation with random labelling at jth iteration, j=1, 2, …, 99• randomly select n1 points from n data points and label the selected points as “case”, label
the remaining n0 points as “control”
• with the relabelled data, estimate kernel smoother and at every data point.
• estimate K1j(r) and K0j(r) and compute Dj(r) at a set of discrete distances {r1, r2, …, rm} .
2). test statistic• for each j, compute
• compute the variance of D(rk) for each k=1, 2, …, m. then get
3). p-value
Intro Models Theory Application26
)(ˆ),(ˆ01 xx jj )(ˆ xj
2
1 0])(ˆ[ˆ
n
i ijjT x
m
kk
kj
j
rD
rDD
1 )](ˆvar[
)(ˆˆ
)199/(]1}ˆˆ{[
)199/(]1}ˆˆ{[
99
1
2
99
1
1
j
j
j
j
DDIp
TTIp
Case study-the chorley data
Intro Models Theory Application27
58 cases 978 controls
Lots of graphs
Intro Models Theory Application28
Intro Models Theory Application29
Monte Carlo test gives p-value = 0.64 →there is no significant spatial variation in the relative risk.
graph
Intro Models Theory Application30
p-value = 0.91→ no significant relative spatial clustering
Summary
summary 31
Spatial point patterns
Spatial point processes
HPP
IPP
λ(s)
K(r)
CSR
Application in epidemiology
Thank you for your attention!
Time for Q&A
The end 32