geocompx · jannes-m · Oct 2, 2024 · Oct 1, 2024 · Oct 1, 2024 · Oct 1, 2024
diff --git a/12-spatial-cv.Rmd b/12-spatial-cv.Rmd
@@ -27,7 +27,7 @@ library(mlr3)               # unified interface to machine learning algorithms
 library(mlr3learners)       # most important machine learning algorithms
 library(mlr3extralearners)  # access to even more learning algorithms
 library(mlr3proba)          # make probabilistic predictions, here only needed for mlr3extralearners::list_learners()
-library(mlr3spatiotempcv)   # spatio-temporal resampling strategies
+library(mlr3spatiotempcv)   # spatiotemporal resampling strategies
 library(mlr3tuning)         # hyperparameter tuning
 library(mlr3viz)            # plotting functions for mlr3 objects
 library(progressr)          # report progress updates
@@ -274,7 +274,7 @@ It acts as a 'meta-package', providing a unified interface to popular supervised
 The standardized **mlr3** interface is based on eight 'building blocks'.
 As illustrated in Figure \@ref(fig:building-blocks), these have a clear order.
 
-(ref:building-blocks) Basic building blocks of the mlr3 package. Source: @bischl_applied_2024. (Permission to reuse this figure was kindly granted.)
+(ref:building-blocks) Basic building blocks of the mlr3 package. @bischl_applied_2024. Permission to reuse this figure was kindly granted.
 
 ```{r building-blocks, echo=FALSE, fig.height=4, fig.width=4, fig.cap="(ref:building-blocks)", fig.scap="Basic building blocks of the mlr3 package."}
 knitr::include_graphics("images/12_ml_abstraction_crop.png")
@@ -399,7 +399,7 @@ We will use a 100-repeated 5-fold spatial CV\index{cross-validation!spatial CV}:
 [^13]: 
 
     Note that package **sperrorest** initially implemented spatial cross-validation in R [@brenning_spatial_2012].
-    In the meantime, its functionality was integrated into the **mlr3** ecosystem which is the reason why we are using **mlr3** [@schratz_hyperparameter_2019]. The **tidymodels** framework is another umbrella-package for streamlined modeling in R; however, it only recently integrated support for spatial cross validation via **spatialsample** which so far only supports one spatial resampling method.
+    In the meantime, its functionality was integrated into the **mlr3** ecosystem which is the reason why we are using **mlr3** [@schratz_hyperparameter_2019]. The **tidymodels** framework is another umbrella package for streamlined modeling in R; however, it only recently integrated support for spatial cross-validation via **spatialsample**, which so far only supports one spatial resampling method.
 
 
 ```{r 12-spatial-cv-18, eval=TRUE}
@@ -409,7 +409,7 @@ resampling = mlr3::rsmp("repeated_spcv_coords", folds = 5, repeats = 100)
 
 To execute the spatial resampling, we run `resample()` using the previously specified task, learner, and resampling strategy.
 This takes some time (around 15 seconds on a modern laptop) because it computes 500 resampling partitions and 500 models. 
-As performance measure, we again choose the AUROC.
+Again, we choose the AUROC as performance measure.
 To retrieve it, we use the `score()` method of the resampling result output object (`score_spcv_glm`).
 This returns a `data.table` object with 500 rows -- one for each model.
 
@@ -471,22 +471,22 @@ To recap, we adhere to the following definition of machine learning by [Jason Br
 In applied machine learning we will borrow, reuse and steal algorithms from many different fields, including statistics and use them towards these ends.
 
 In Section \@ref(glm) a GLM was used to predict landslide susceptibility.
-This section introduces support vector machines (SVM)\index{SVM} for the same purpose.
+This section introduces support vector machines (SVMs)\index{SVM} for the same purpose.
 Random forest\index{random forest} models might be more popular than SVMs; however, the positive effect of tuning hyperparameters\index{hyperparameter} on model performance is much more pronounced in the case of SVMs [@probst_hyperparameters_2018].
 Since (spatial) hyperparameter tuning is the major aim of this section, we will use an SVM.
 For those wishing to apply a random forest model, we recommend to read this chapter, and then proceed to Chapter \@ref(eco) in which we will apply the currently covered concepts and techniques to make spatial distribution maps based on a random forest model.
 
 SVMs\index{SVM} search for the best possible 'hyperplanes' to separate classes (in a classification\index{classification} case) and estimate 'kernels' with specific hyperparameters\index{hyperparameter} to create non-linear boundaries between classes [@james_introduction_2013].
 Machine learning algorithms often feature hyperparameters\index{hyperparameter} and parameters.
-Parameters can be estimated from the data while hyperparameters\index{hyperparameter} are set before the learning begins (see also the [machine mastery blog](https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/) and the [hyperparameter optimization chapter](https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html) of the mlr3 book).
+Parameters can be estimated from the data, while hyperparameters\index{hyperparameter} are set before the learning begins (see also the [machine mastery blog](https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/) and the [hyperparameter optimization chapter](https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html) of the mlr3 book).
 The optimal hyperparameter\index{hyperparameter} configuration is usually found within a specific search space and determined with the help of cross-validation methods.
 This is called hyperparameter\index{hyperparameter} tuning and the main topic of this section.
 
 Some SVM implementations such as that provided by **kernlab** allow hyperparameters to be tuned automatically, usually based on random sampling (see upper row of Figure \@ref(fig:partitioning)).
 This works for non-spatial data but is of less use for spatial data where 'spatial tuning' should be undertaken.
 
 Before defining spatial tuning, we will set up the **mlr3**\index{mlr3 (package)} building blocks, introduced in Section \@ref(glm), for the SVM.
-The classification\index{classification} task remains the same, hence we can simply reuse the `task` object created in Section \@ref(glm).
+The classification\index{classification} task remains the same, hence, we can simply reuse the `task` object created in Section \@ref(glm).
 Learners implementing SVM can be found using the `list_mlr3learners()` command of the **mlr3extralearners**.
 
 ```{r 12-spatial-cv-23, eval=TRUE, echo=TRUE}
@@ -568,7 +568,7 @@ To make the performance estimation processing chain even clearer, let us write d
 1. Performance level (upper left part of Figure \@ref(fig:inner-outer)) - split the dataset into five spatially disjoint (outer) subfolds
 1. Tuning level (lower left part of Figure \@ref(fig:inner-outer)) - use the first fold of the performance level and split it again spatially into five (inner) subfolds for the hyperparameter tuning. 
 Use the 50 randomly selected hyperparameters\index{hyperparameter} in each of these inner subfolds, i.e., fit 250 models
-1. Performance estimation - Use the best hyperparameter combination from the previous step (tuning level) and apply it to the first outer fold in the performance level to estimate the performance (AUROC\index{AUROC})
+1. Performance estimation - use the best hyperparameter combination from the previous step (tuning level) and apply it to the first outer fold in the performance level to estimate the performance (AUROC\index{AUROC})
 1. Repeat steps 2 and 3 for the remaining four outer folds
 1. Repeat steps 2 to 4, 100 times
 
@@ -658,7 +658,7 @@ Machine learning algorithms often require hyperparameter\index{hyperparameter} i
 Machine learning overall, and its use to understand spatial data, is a large field and this chapter has provided the basics, but there is more to learn.
 We recommend the following resources in this direction:
 
-- The **mlr3 book** (@bischl_applied_2024; https://mlr3book.mlr-org.com/) and especially the [chapter on the handling of spatiotemporal data](https://mlr3book.mlr-org.com/chapters/chapter13/beyond_regression_and_classification.html#sec-spatiotemporal)
+- The **mlr3 book** (@bischl_applied_2024; https://mlr3book.mlr-org.com/) and especially the [chapter on the handling of spatiotemporal data](https://mlr3book.mlr-org.com/chapters/chapter13/beyond_regression_and_classification.html#spatiotemp-cv)
 - An academic paper on hyperparameter\index{hyperparameter} tuning [@schratz_hyperparameter_2019]
 - An academic paper on how to use **mlr3spatiotempcv** [@schratz_mlr3spatiotempcv_2021]
 - In case of spatiotemporal data, one should account for spatial\index{autocorrelation!spatial} and temporal\index{autocorrelation!temporal} autocorrelation when doing CV\index{cross-validation} [@meyer_improving_2018]