11-PointPattern.qmd

# Point Pattern Analysis {#sec-pointpatterns}

\index{point pattern analysis}
\index{spatstat}

Point pattern analysis is concerned with describing patterns of
points over space and making inference about the process that
could have generated an observed pattern. The main focus here lies on
the information carried in the locations of the points, and typically
these locations are not controlled by sampling but a result of a
process we are interested in, such as animal sightings, accidents,
disease cases, or tree locations. This is opposed to geostatistical
processes (@sec-interpolation) where we have values of
some phenomenon everywhere but observations limited to a set of
locations _that we can control_, at least in principle. Hence, in
geostatistical problems the prime interest is not in the observation
locations but in estimating the value of the observed phenomenon
at unobserved locations. Point pattern analysis typically assumes
that for an observed area, all points are available, meaning that
locations without a point are not unobserved as in a geostatistical
process, but are observed and contain no point. In terms of random
processes, in point processes locations are random variables, where,
in geostatistical processes the measured variable is a random field
with locations fixed.

This chapter is confined to describing the very basics of point
pattern analysis, using package **spatstat** [@R-spatstat], and
related packages by the same authors. The **spatstat** book of
@baddeley2015spatial gives a comprehensive introduction to point
pattern theory and the use of the **spatstat** package family,
which we will not try to copy.  Inclusion of particular topics in
this chapter should not be seen as an expression that these are
more relevant than others.  In particular, this chapter tries to
illustrate interfaces existing between **spatstat** and the more
spatial data science oriented packages **sf** and **stars**.
A further book that introduces point patterns analysis is
@STOYAN2017125. R package **stpp** [@R-stpp] for analysing
spatiotemporal point processes is discussed in @stpp.

Important concepts of point patterns analysis are the distinction
between a point _pattern_ and a point _process_: the latter is the
stochastic process that, when sampled, generates a point pattern.
A dataset is always a point pattern, and inference involves figuring
out the properties of a process that could have generated a pattern
like the one we observed. Properties of a spatial point process
include

* first order properties: the intensity function measures the
  number of points per area unit; this function is spatially varying
  for a _inhomogeneous_ point process
* second order properties: given a constant or varying intensity function, 
  describe whether points are distributed
  independently _from one another_, tend to attract
  each other (clustering), or repulse each other (more regularly
  distributed than under complete spatial randomness)

## Observation window

\index{observation window}

Point patterns have an observation window. Consider the points generated randomly by 

```{r}
library(sf)
n <- 30
set.seed(13531) # remove this to create another random sequence
xy <- data.frame(x = runif(n), y = runif(n)) |> 
    st_as_sf(coords = c("x", "y"))
```

then these points are (by construction) uniformly distributed,
or completely spatially random, over the domain $[0,1] \times [0,1]$. For
a larger domain, they are not uniform, for the two square windows
`w1` and `w2` created by 
```{r}
w1 <- st_bbox(c(xmin = 0, ymin = 0, xmax = 1, ymax = 1)) |> 
		st_as_sfc() 
w2 <- st_sfc(st_point(c(1, 0.5))) |> st_buffer(1.2)
```
this is shown in @fig-pp1.

```{r fig-pp1, echo=!knitr::is_latex_output()}
#| fig.height: 3.3
#| fig.cap: "Depending on the observation window (grey), the same point pattern can appear completely spatially random (left), or clustered (right)"
#| code-fold: true
par(mfrow = c(1, 2), mar = c(2.1, 2.1, 0.1, 0.5), xaxs = "i", yaxs = "i")
plot(w1, axes = TRUE, col = 'grey')
plot(xy, add = TRUE)
plot(w2, axes = TRUE, col = 'grey')
plot(xy, add = TRUE, cex = .5)
```

\index{ppp}
\index{sf!as.ppp}

Point patterns in **spatstat** are objects of class `ppp` that contain
points and an observation window (an object of class `owin`).
We can create a `ppp` from points by
```{r}
library(spatstat) |> suppressPackageStartupMessages()
as.ppp(xy)
```

\newpage
where we see that the bounding box of the points is used as observation
window when no window is specified. If we add a polygonal geometry as the
first feature of the dataset, then this is used as observation window:
```{r}
(pp1 <- c(w1, st_geometry(xy)) |> as.ppp())
c1 <- st_buffer(st_centroid(w2), 1.2)
(pp2 <- c(c1, st_geometry(xy)) |> as.ppp())
```

\index{point pattern!homogeneity}

To test for homogeneity, one could carry out a quadrat count, using
an appropriate quadrat layout (a 3 $\times$ 3 layout is shown in 
@fig-quadrat)
```{r fig-quadrat, echo = !knitr::is_latex_output()}
#| fig.height: 3.5
#| fig.cap: "3 $\\times$ 3 quadrat counts for the two point patterns"
par(mfrow = c(1, 2), mar = rep(0, 4))
q1 <- quadratcount(pp1, nx=3, ny=3)
q2 <- quadratcount(pp2, nx=3, ny=3)
plot(q1, main = "")
plot(xy, add = TRUE)
plot(q2, main = "")
plot(xy, add = TRUE)
```

\index{quadrat.test}
and carry out a $\chi^2$ test on these counts:
```{r}
quadrat.test(pp1, nx=3, ny=3)
quadrat.test(pp2, nx=3, ny=3)
```
which indicates that for the second case we have an indication
that this is not a CSR (completely spatially random) pattern.
As indicated by the warning, we should take the p-values with a
large grain of salt because we have too small expected counts.

\index{kernel density}
Kernel densities can be computed using `density`, where kernel shape and
bandwidth can be controlled. Here, cross-validation is used by function
`bw.diggle` to specify the bandwidth parameter `sigma`; plots are shown in
@fig-bwdiggle.

```{r}
den1 <- density(pp1, sigma = bw.diggle)
den2 <- density(pp2, sigma = bw.diggle)
```
```{r fig-bwdiggle, echo=!knitr::is_latex_output()}
#| fig.height: 3.0
#| fig.cap: "Kernel densities for both point patterns"
#| code-fold: true
par(mfrow = c(1, 2), mar = c(0,0,1.1,2))
plot(den1)
plot(pp1, add=TRUE)
plot(den2)
plot(pp1, add=TRUE)
```

The density maps created this way are obviously raster images, and we can
convert them into stars object by

```{r}
library(stars)
s1 <- st_as_stars(den1)
(s2 <- st_as_stars(den2))
```

and we can verify that the area under the density surface is similar
to the sample size (`r n`),  by
```{r}
s1$a <- st_area(s1) |> suppressMessages()
s2$a <- st_area(s2) |> suppressMessages()
with(s1, sum(v * a, na.rm = TRUE))
with(s2, sum(v * a, na.rm = TRUE))
```

\index{ppm}
\index{point pattern!density model}

More exciting applications involve modelling the density
surface as a function of external variables. Suppose we want
to model the density of `pp2` as a Poisson point process (meaning
that points do not interact with each other), where 
the intensity is a function of distance to the centre of the 
"cluster", and these distance are available in a `stars` object:
```{r}
pt <- st_sfc(st_point(c(0.5, 0.5)))
st_as_sf(s2, as_points = TRUE, na.rm = FALSE) |>
  st_distance(pt) -> s2$dist
```
we can then model the densities using `ppm`, where the _name_ of the
point pattern object is used on the left-hand side of the `formula`:
```{r}
(m <- ppm(pp2 ~ dist, data = list(dist = as.im(s2["dist"]))))
```

The returned object is of class `ppm`, and can be plotted: @fig-ppm 
shows the predicted surface. The prediction standard error can also be plotted.

```{r fig-ppm, echo=!knitr::is_latex_output()}
#| fig.cap: "Predicted densities of a ppm model"
#| out.width: 50%
#| code-fold: true
plot(m, se = FALSE)
```
The model also has a `predict` method, which returns an `im` object that
can be converted into a `stars` object by
```{r}
predict(m, covariates = list(dist = as.im(s2["dist"]))) |>
    st_as_stars()
```

## Coordinate reference systems

All routines in **spatstat** are layed out for two-dimensional data with
Cartesian coordinates. If we try to convert an object with ellipsoidal
coordinates, we get an error:
```{r error=TRUE}
system.file("gpkg/nc.gpkg", package = "sf") |> 
    read_sf() |>
	st_geometry() |>
    st_centroid() |>
    as.ppp()
```

When converting to a **spatstat** data structure we loose
the coordinate reference system we started with. It can be set
back to `sf` or `stars` objects by using `st_set_crs`.

## Marked point patterns, points on linear networks

\index{point pattern!marked}

A few more data types can be converted to and from **spatstat**.
Marked point patterns are point patterns that have a "mark", which
is either a categorical label or a numeric label for each point.
A dataset available in **spatstat** with marks is the `longleaf` pines
dataset, containing diameter at breast height as a numeric mark:
```{r}
longleaf
ll <- st_as_sf(longleaf)
print(ll, n = 3)
```
Values can be converted back to `ppp` with
```{r}
as.ppp(ll)
```

\index{point pattern!line segments}
Line segments, in **spatstat** objects of class `psp` can be converted
back and forth to simple feature with `LINESTRING` geometries
following a `POLYGON` feature with the observation window, as in
```{r}
print(st_as_sf(copper$SouthLines), n = 5)
```

Finally, point patterns on linear networks, in **spatstat** 
represented by `lpp` objects, can be converted to `sf` by
```{r}
print(st_as_sf(chicago), n = 5)
```
where we only see the first five features; the points are also
in this object, as variable `label` indicates
```{r}
table(st_as_sf(chicago)$label)
```
Potential information about network _structure_, how
`LINESTRING` geometries are connected, is not present in the `sf`
object. Package **sfnetworks** [@R-sfnetworks] would be a candidate
package to hold such information in R or to pass on network data
imported from OpenStreetMaps to **spatstat**.

## Spatial sampling and simulating a point process 

\index{point pattern!simulation}
Package **sf** contains an `st_sample` method that samples points from
`MULTIPOINT`, linear or polygonal geometries, using different spatial
sampling strategies. It natively supports strategies "random",
"hexagonal", "Fibonacci" (@sec-simsphere), and "regular", where "regular" refers to
sampling on a square regular grid and "hexagonal" essentially gives
a triangular grid. For type "random", it can return exactly the
number of requested points, for other types this is approximate.

`st_sample` also interfaces point process simulation functions of
**spatstat**, when other values for sampling type are chosen. For instance
the **spatstat** function `rThomas` is invoked when setting `type = Thomas`
(@fig-rThomas):
```{r}
kappa <- 30 / st_area(w2) # intensity
th <- st_sample(w2, kappa = kappa, mu = 3, scale = 0.05, 
    type = "Thomas")
nrow(th)
```
```{r fig-rThomas, echo=!knitr::is_latex_output()}
#| fig.height: 4
#| out.width: 60%
#| fig.cap: "Thomas process with mu = 3 and scale = 0.05"
#| code-fold: true
par(mar = rep(0, 4))
plot(w2)
plot(th, add = TRUE)
```
The help function obtained by `?rThomas` details the meaning of the
parameters `kappa`, `mu`, and `scale`. Simulating point processes
means that the intensity is given, not the sample size. The sample
size within the observation window obtained this way is a random
variable.

## Simulating points on the sphere {#sec-simsphere}

\index{point pattern!simulation on the sphere}
Another spatial random sampling type supported by `sf` natively
(in `st_sample`) is simulation of random points on the sphere. An
example of this is shown in @fig-srsglobe, where points
were constrained to those in oceans. Points approximately regularly
distributed over a sphere are obtained by `st_sample` with
`type = "Fibonacci"` [@fibo].

```{r fig-srsglobe, echo=!knitr::is_latex_output(), messages = FALSE}
#| fig.cap: "Points sampled on the globe over the oceans: randomly (left) and approximately regular (Fibonacci; right), shown in an orthographic projection"
#| out.width: 100%
#| code-fold: true
par(mar = rep(0, 4), mfrow = c(1, 2))
library(s2)
g <- as_s2_geography(TRUE) # Earth
co <- s2_data_countries()
oc <- s2_difference(g, s2_union_agg(co)) # oceans
b <- s2_buffer_cells(as_s2_geography("POINT(-30 -10)"), 9800000) # visible half
i <- s2_intersection(b, oc) # visible ocean
co <- s2_intersection(b, co)
ortho = st_crs("+proj=ortho +lat_0=-10 +lon_0=-30")
# background:
st_transform(st_as_sfc(i), ortho) |> plot(col = 'lightblue')
st_transform(st_as_sfc(co), ortho) |> plot(col = NA, add = TRUE, border = 'grey')
# sampling randomly from globe:
sf_use_s2(FALSE) |> suppressMessages()
st_as_stars() |> st_bbox() |> st_as_sfc() |> st_sample(1000, exact = FALSE) |>
  suppressMessages() -> pts
sf_use_s2(TRUE) |> suppressMessages()
pts |> s2_intersection(i) |> st_as_sfc() -> pts
# add:
st_transform(pts, ortho) |> plot(add = TRUE, pch = 3, cex = .5)
# right: background:
st_transform(st_as_sfc(i), ortho) |> plot(col = 'lightblue')
st_transform(st_as_sfc(co), ortho) |> plot(col = NA, add = TRUE, border = 'grey')
# Fibonacci:
sf_use_s2(FALSE) |> suppressMessages()
st_as_stars() |> st_bbox() |> st_as_sfc() |> 
  st_sample(1000, type = "Fibonacci", exact = FALSE) |> suppressMessages() -> pts
sf_use_s2(TRUE) |> suppressMessages()
pts |> s2_intersection(i) |> st_as_sfc() -> pts
st_transform(pts, ortho) |> plot(add = TRUE, pch = 3, cex = .5)
```

## Exercises

1. After loading **spatstat**, recreate the plot obtained by `plot(longleaf)` 
by using **ggplot2** and `geom_sf()`, and by `sf::plot()`.
2. Convert the sample locations of the NO$_2$ data used in @sec-interpolation
   to a `ppp` object, with a proper window.
3. Compute and plot the density of the NO$_2$ dataset, import the density as a `stars`
   object, and compute the volume under the surface. 

```{r echo=FALSE}
# we need to detach, to avoid name clashes of using idw() and diagnose() later on
library(spatstat)
w <- function(x) which(x == search())
detach(pos = w("package:spatstat"))
detach(pos = w("package:spatstat.linnet"))
pc <- w("package:spatstat.core")
if (length(pc)) {
	detach(pos = w("package:spatstat.core"))
} else { # > 3.0.0
	detach(pos = w("package:spatstat.model"))
	detach(pos = w("package:spatstat.explore"))
}
detach(pos = w("package:spatstat.random"))
detach(pos = w("package:spatstat.geom"))
detach(pos = w("package:spatstat.data"))
```

{{< include ga.qmd >}}