Skip to content

Commit 87af1ce

Browse files
committed
Fix last fig refs
1 parent 0a4e98b commit 87af1ce

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

08-feature-engineering.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -317,7 +317,7 @@ _Order matters_. The gross living area is log transformed prior to the interact
317317

318318
When a predictor has a nonlinear relationship with the outcome, some types of predictive models can adaptively approximate this relationship during training. However, simpler is usually better and it is not uncommon to try to use a simple model, such as a linear fit, and add in specific non-linear features for predictors that may need them. One common method for doing this is to use _spline_ functions to represent the data. Splines replace the existing numeric predictor with a set of columns that allow a model to emulate a flexible, non-linear relationship. As more spline terms are added to the data, the capacity to non-linearly represent the relationship increases. Unfortunately, it may also increase the likelihood of picking up on data trends that occur by chance (i.e., over-fitting).
319319

320-
If you have ever used `geom_smooth()` within a `ggplot`, you have probably used a spline representation of the data. For example, each panel in Figure \@ref(ames-latitude-splines) uses a different number of smooth splines for the latitude predictor:
320+
If you have ever used `geom_smooth()` within a `ggplot`, you have probably used a spline representation of the data. For example, each panel in Figure \@ref(fig:ames-latitude-splines) uses a different number of smooth splines for the latitude predictor:
321321

322322
```{r engineering-ames-splines, eval=FALSE}
323323
library(patchwork)

12-tuning-parameters.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ For cases where the statistical properties of the tuning parameter are tractable
108108
109109
To demonstrate, consider the classification data shown in Figure \@ref(fig:two-class-dat) with two predictors, two classes, and a training set of `r nrow(training_set)` data points.
110110

111-
```{r tuning-two-class-dat}
111+
```{r two-class-dat}
112112
#| echo = FALSE,
113113
#| fig.cap = "An example two-class classification data set with two predictors.",
114114
#| fig.alt = "An example two-class classification data set with two predictors. The two predictors have a moderate correlation and there is some locations of separation between the classes."
@@ -215,7 +215,7 @@ These results show that there is considerable evidence that the choice of the li
215215

216216
What about a different metric? We also calculated the area under the ROC curve for each resample. These results, which reflect the discriminative ability of the models across numerous probability thresholds, show a lack of difference in Figure \@ref(fig:resampled-roc).
217217

218-
```{r tuning-resampled-roc}
218+
```{r resampled-roc}
219219
#| echo = FALSE,
220220
#| fig.height = 3,
221221
#| fig.cap = "Means and approximate 90% confidence intervals for the resampled area under the ROC curve with three different link functions.",
@@ -231,7 +231,7 @@ resampled_res %>%
231231

232232
Given the overlap of the intervals, as well as the scale of the x-axis, any of these options could be used. We see this again when the class boundaries for the three models are overlaid on the _test set_ of `r nrow(testing_set)` data points in Figure \@ref(fig:three-link-fits).
233233

234-
```{r tuning-glm-fits}
234+
```{r three-link-fits}
235235
#| echo = FALSE,
236236
#| fig.cap = "The linear class boundary fits for three link functions.",
237237
#| fig.alt = "The linear class boundary fits for three link functions. The lines have very similar slopes with the complementary log log having a slightly different intercept than the other two links."

0 commit comments

Comments
 (0)