forked from edzer/sdsr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
11-PointPattern.qmd
386 lines (342 loc) · 14.1 KB
/
11-PointPattern.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
# Point Pattern Analysis {#sec-pointpatterns}
\index{point pattern analysis}
\index{spatstat}
Point pattern analysis is concerned with describing patterns of
points over space and making inference about the process that
could have generated an observed pattern. The main focus here lies on
the information carried in the locations of the points, and typically
these locations are not controlled by sampling but a result of a
process we are interested in, such as animal sightings, accidents,
disease cases, or tree locations. This is opposed to geostatistical
processes (@sec-interpolation) where we have values of
some phenomenon everywhere but observations limited to a set of
locations _that we can control_, at least in principle. Hence, in
geostatistical problems the prime interest is not in the observation
locations but in estimating the value of the observed phenomenon
at unobserved locations. Point pattern analysis typically assumes
that for an observed area, all points are available, meaning that
locations without a point are not unobserved as in a geostatistical
process, but are observed and contain no point. In terms of random
processes, in point processes locations are random variables, where,
in geostatistical processes the measured variable is a random field
with locations fixed.
This chapter is confined to describing the very basics of point
pattern analysis, using package **spatstat** [@R-spatstat], and
related packages by the same authors. The **spatstat** book of
@baddeley2015spatial gives a comprehensive introduction to point
pattern theory and the use of the **spatstat** package family,
which we will not try to copy. Inclusion of particular topics in
this chapter should not be seen as an expression that these are
more relevant than others. In particular, this chapter tries to
illustrate interfaces existing between **spatstat** and the more
spatial data science oriented packages **sf** and **stars**.
A further book that introduces point patterns analysis is
@STOYAN2017125. R package **stpp** [@R-stpp] for analysing
spatiotemporal point processes is discussed in @stpp.
Important concepts of point patterns analysis are the distinction
between a point _pattern_ and a point _process_: the latter is the
stochastic process that, when sampled, generates a point pattern.
A dataset is always a point pattern, and inference involves figuring
out the properties of a process that could have generated a pattern
like the one we observed. Properties of a spatial point process
include
* first order properties: the intensity function measures the
number of points per area unit; this function is spatially varying
for a _inhomogeneous_ point process
* second order properties: given a constant or varying intensity function,
describe whether points are distributed
independently _from one another_, tend to attract
each other (clustering), or repulse each other (more regularly
distributed than under complete spatial randomness)
## Observation window
\index{observation window}
Point patterns have an observation window. Consider the points generated randomly by
```{r}
library(sf)
n <- 30
set.seed(13531) # remove this to create another random sequence
xy <- data.frame(x = runif(n), y = runif(n)) |>
st_as_sf(coords = c("x", "y"))
```
then these points are (by construction) uniformly distributed,
or completely spatially random, over the domain $[0,1] \times [0,1]$. For
a larger domain, they are not uniform, for the two square windows
`w1` and `w2` created by
```{r}
w1 <- st_bbox(c(xmin = 0, ymin = 0, xmax = 1, ymax = 1)) |>
st_as_sfc()
w2 <- st_sfc(st_point(c(1, 0.5))) |> st_buffer(1.2)
```
this is shown in @fig-pp1.
```{r fig-pp1, echo=!knitr::is_latex_output()}
#| fig.height: 3.3
#| fig.cap: "Depending on the observation window (grey), the same point pattern can appear completely spatially random (left), or clustered (right)"
#| code-fold: true
par(mfrow = c(1, 2), mar = c(2.1, 2.1, 0.1, 0.5), xaxs = "i", yaxs = "i")
plot(w1, axes = TRUE, col = 'grey')
plot(xy, add = TRUE)
plot(w2, axes = TRUE, col = 'grey')
plot(xy, add = TRUE, cex = .5)
```
\index{ppp}
\index{sf!as.ppp}
Point patterns in **spatstat** are objects of class `ppp` that contain
points and an observation window (an object of class `owin`).
We can create a `ppp` from points by
```{r}
library(spatstat) |> suppressPackageStartupMessages()
as.ppp(xy)
```
\newpage
where we see that the bounding box of the points is used as observation
window when no window is specified. If we add a polygonal geometry as the
first feature of the dataset, then this is used as observation window:
```{r}
(pp1 <- c(w1, st_geometry(xy)) |> as.ppp())
c1 <- st_buffer(st_centroid(w2), 1.2)
(pp2 <- c(c1, st_geometry(xy)) |> as.ppp())
```
\index{point pattern!homogeneity}
To test for homogeneity, one could carry out a quadrat count, using
an appropriate quadrat layout (a 3 $\times$ 3 layout is shown in
@fig-quadrat)
```{r fig-quadrat, echo = !knitr::is_latex_output()}
#| fig.height: 3.5
#| fig.cap: "3 $\\times$ 3 quadrat counts for the two point patterns"
par(mfrow = c(1, 2), mar = rep(0, 4))
q1 <- quadratcount(pp1, nx=3, ny=3)
q2 <- quadratcount(pp2, nx=3, ny=3)
plot(q1, main = "")
plot(xy, add = TRUE)
plot(q2, main = "")
plot(xy, add = TRUE)
```
\index{quadrat.test}
and carry out a $\chi^2$ test on these counts:
```{r}
quadrat.test(pp1, nx=3, ny=3)
quadrat.test(pp2, nx=3, ny=3)
```
which indicates that for the second case we have an indication
that this is not a CSR (completely spatially random) pattern.
As indicated by the warning, we should take the p-values with a
large grain of salt because we have too small expected counts.
\index{kernel density}
Kernel densities can be computed using `density`, where kernel shape and
bandwidth can be controlled. Here, cross-validation is used by function
`bw.diggle` to specify the bandwidth parameter `sigma`; plots are shown in
@fig-bwdiggle.
```{r}
den1 <- density(pp1, sigma = bw.diggle)
den2 <- density(pp2, sigma = bw.diggle)
```
```{r fig-bwdiggle, echo=!knitr::is_latex_output()}
#| fig.height: 3.0
#| fig.cap: "Kernel densities for both point patterns"
#| code-fold: true
par(mfrow = c(1, 2), mar = c(0,0,1.1,2))
plot(den1)
plot(pp1, add=TRUE)
plot(den2)
plot(pp1, add=TRUE)
```
The density maps created this way are obviously raster images, and we can
convert them into stars object by
```{r}
library(stars)
s1 <- st_as_stars(den1)
(s2 <- st_as_stars(den2))
```
and we can verify that the area under the density surface is similar
to the sample size (`r n`), by
```{r}
s1$a <- st_area(s1) |> suppressMessages()
s2$a <- st_area(s2) |> suppressMessages()
with(s1, sum(v * a, na.rm = TRUE))
with(s2, sum(v * a, na.rm = TRUE))
```
\index{ppm}
\index{point pattern!density model}
More exciting applications involve modelling the density
surface as a function of external variables. Suppose we want
to model the density of `pp2` as a Poisson point process (meaning
that points do not interact with each other), where
the intensity is a function of distance to the centre of the
"cluster", and these distance are available in a `stars` object:
```{r}
pt <- st_sfc(st_point(c(0.5, 0.5)))
st_as_sf(s2, as_points = TRUE, na.rm = FALSE) |>
st_distance(pt) -> s2$dist
```
we can then model the densities using `ppm`, where the _name_ of the
point pattern object is used on the left-hand side of the `formula`:
```{r}
(m <- ppm(pp2 ~ dist, data = list(dist = as.im(s2["dist"]))))
```
The returned object is of class `ppm`, and can be plotted: @fig-ppm
shows the predicted surface. The prediction standard error can also be plotted.
```{r fig-ppm, echo=!knitr::is_latex_output()}
#| fig.cap: "Predicted densities of a ppm model"
#| out.width: 50%
#| code-fold: true
plot(m, se = FALSE)
```
The model also has a `predict` method, which returns an `im` object that
can be converted into a `stars` object by
```{r}
predict(m, covariates = list(dist = as.im(s2["dist"]))) |>
st_as_stars()
```
## Coordinate reference systems
All routines in **spatstat** are layed out for two-dimensional data with
Cartesian coordinates. If we try to convert an object with ellipsoidal
coordinates, we get an error:
```{r error=TRUE}
system.file("gpkg/nc.gpkg", package = "sf") |>
read_sf() |>
st_geometry() |>
st_centroid() |>
as.ppp()
```
When converting to a **spatstat** data structure we loose
the coordinate reference system we started with. It can be set
back to `sf` or `stars` objects by using `st_set_crs`.
## Marked point patterns, points on linear networks
\index{point pattern!marked}
A few more data types can be converted to and from **spatstat**.
Marked point patterns are point patterns that have a "mark", which
is either a categorical label or a numeric label for each point.
A dataset available in **spatstat** with marks is the `longleaf` pines
dataset, containing diameter at breast height as a numeric mark:
```{r}
longleaf
ll <- st_as_sf(longleaf)
print(ll, n = 3)
```
Values can be converted back to `ppp` with
```{r}
as.ppp(ll)
```
\index{point pattern!line segments}
Line segments, in **spatstat** objects of class `psp` can be converted
back and forth to simple feature with `LINESTRING` geometries
following a `POLYGON` feature with the observation window, as in
```{r}
print(st_as_sf(copper$SouthLines), n = 5)
```
Finally, point patterns on linear networks, in **spatstat**
represented by `lpp` objects, can be converted to `sf` by
```{r}
print(st_as_sf(chicago), n = 5)
```
where we only see the first five features; the points are also
in this object, as variable `label` indicates
```{r}
table(st_as_sf(chicago)$label)
```
Potential information about network _structure_, how
`LINESTRING` geometries are connected, is not present in the `sf`
object. Package **sfnetworks** [@R-sfnetworks] would be a candidate
package to hold such information in R or to pass on network data
imported from OpenStreetMaps to **spatstat**.
## Spatial sampling and simulating a point process
\index{point pattern!simulation}
Package **sf** contains an `st_sample` method that samples points from
`MULTIPOINT`, linear or polygonal geometries, using different spatial
sampling strategies. It natively supports strategies "random",
"hexagonal", "Fibonacci" (@sec-simsphere), and "regular", where "regular" refers to
sampling on a square regular grid and "hexagonal" essentially gives
a triangular grid. For type "random", it can return exactly the
number of requested points, for other types this is approximate.
`st_sample` also interfaces point process simulation functions of
**spatstat**, when other values for sampling type are chosen. For instance
the **spatstat** function `rThomas` is invoked when setting `type = Thomas`
(@fig-rThomas):
```{r}
kappa <- 30 / st_area(w2) # intensity
th <- st_sample(w2, kappa = kappa, mu = 3, scale = 0.05,
type = "Thomas")
nrow(th)
```
```{r fig-rThomas, echo=!knitr::is_latex_output()}
#| fig.height: 4
#| out.width: 60%
#| fig.cap: "Thomas process with mu = 3 and scale = 0.05"
#| code-fold: true
par(mar = rep(0, 4))
plot(w2)
plot(th, add = TRUE)
```
The help function obtained by `?rThomas` details the meaning of the
parameters `kappa`, `mu`, and `scale`. Simulating point processes
means that the intensity is given, not the sample size. The sample
size within the observation window obtained this way is a random
variable.
## Simulating points on the sphere {#sec-simsphere}
\index{point pattern!simulation on the sphere}
Another spatial random sampling type supported by `sf` natively
(in `st_sample`) is simulation of random points on the sphere. An
example of this is shown in @fig-srsglobe, where points
were constrained to those in oceans. Points approximately regularly
distributed over a sphere are obtained by `st_sample` with
`type = "Fibonacci"` [@fibo].
```{r fig-srsglobe, echo=!knitr::is_latex_output(), messages = FALSE}
#| fig.cap: "Points sampled on the globe over the oceans: randomly (left) and approximately regular (Fibonacci; right), shown in an orthographic projection"
#| out.width: 100%
#| code-fold: true
par(mar = rep(0, 4), mfrow = c(1, 2))
library(s2)
g <- as_s2_geography(TRUE) # Earth
co <- s2_data_countries()
oc <- s2_difference(g, s2_union_agg(co)) # oceans
b <- s2_buffer_cells(as_s2_geography("POINT(-30 -10)"), 9800000) # visible half
i <- s2_intersection(b, oc) # visible ocean
co <- s2_intersection(b, co)
ortho = st_crs("+proj=ortho +lat_0=-10 +lon_0=-30")
# background:
st_transform(st_as_sfc(i), ortho) |> plot(col = 'lightblue')
st_transform(st_as_sfc(co), ortho) |> plot(col = NA, add = TRUE, border = 'grey')
# sampling randomly from globe:
sf_use_s2(FALSE) |> suppressMessages()
st_as_stars() |> st_bbox() |> st_as_sfc() |> st_sample(1000, exact = FALSE) |>
suppressMessages() -> pts
sf_use_s2(TRUE) |> suppressMessages()
pts |> s2_intersection(i) |> st_as_sfc() -> pts
# add:
st_transform(pts, ortho) |> plot(add = TRUE, pch = 3, cex = .5)
# right: background:
st_transform(st_as_sfc(i), ortho) |> plot(col = 'lightblue')
st_transform(st_as_sfc(co), ortho) |> plot(col = NA, add = TRUE, border = 'grey')
# Fibonacci:
sf_use_s2(FALSE) |> suppressMessages()
st_as_stars() |> st_bbox() |> st_as_sfc() |>
st_sample(1000, type = "Fibonacci", exact = FALSE) |> suppressMessages() -> pts
sf_use_s2(TRUE) |> suppressMessages()
pts |> s2_intersection(i) |> st_as_sfc() -> pts
st_transform(pts, ortho) |> plot(add = TRUE, pch = 3, cex = .5)
```
## Exercises
1. After loading **spatstat**, recreate the plot obtained by `plot(longleaf)`
by using **ggplot2** and `geom_sf()`, and by `sf::plot()`.
2. Convert the sample locations of the NO$_2$ data used in @sec-interpolation
to a `ppp` object, with a proper window.
3. Compute and plot the density of the NO$_2$ dataset, import the density as a `stars`
object, and compute the volume under the surface.
```{r echo=FALSE}
# we need to detach, to avoid name clashes of using idw() and diagnose() later on
library(spatstat)
w <- function(x) which(x == search())
detach(pos = w("package:spatstat"))
detach(pos = w("package:spatstat.linnet"))
pc <- w("package:spatstat.core")
if (length(pc)) {
detach(pos = w("package:spatstat.core"))
} else { # > 3.0.0
detach(pos = w("package:spatstat.model"))
detach(pos = w("package:spatstat.explore"))
}
detach(pos = w("package:spatstat.random"))
detach(pos = w("package:spatstat.geom"))
detach(pos = w("package:spatstat.data"))
```
{{< include ga.qmd >}}