You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 16-dimensionality-reduction.Rmd
+6-5Lines changed: 6 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -111,7 +111,7 @@ bean_val$splits[[1]]
111
111
112
112
To visually assess how well different methods perform, we can estimate the methods on the training set (n = `r analysis(bean_val$splits[[1]]) %>% nrow()` beans) and display the results using the validation set (n = `r assessment(bean_val$splits[[1]]) %>% nrow()`).
113
113
114
-
Before beginning, we can spend some time investigating our data. Since we know that many of these shape features are probably measuring similar concepts, let's take a look at the correlation structure of the data in Figure \@ref(fig:corr-plot) using this code.
114
+
Before beginning, we can spend some time investigating our data. Since we know that many of these shape features are probably measuring similar concepts, let's take a look at the correlation structure of the data in Figure \@ref(fig:beans-corr-plot) using this code.
@@ -372,7 +372,7 @@ Solidity (i.e., the density of the bean) drives the third PLS component, along w
372
372
373
373
ICA is slightly different than PCA in that it finds components that are as statistically independent from one another as possible (as opposed to being uncorrelated). It can be thought of as maximizing the "non-Gaussianity" of the ICA components. Let's use `step_ica()` to produce Figure \@ref(fig:bean-ica).
It is clear from these results that most models give very good performance; there are few bad choices here. For demonstration, we'll use the RDA model with PLS features as the final model. We will finalize the workflow with the numerically best parameters, fit it to the training set, then evaluate with the test set:
Copy file name to clipboardExpand all lines: 18-explaining-models-and-predictions.Rmd
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -49,6 +49,7 @@ In Chapters \@ref(resampling) and \@ref(compare), we trained and compared severa
49
49
50
50
```{r explain-obs-pred}
51
51
#| echo = FALSE,
52
+
#| fig.height = 4,
52
53
#| fig.cap = "Comparing predicted prices for a linear model with interactions and a random forest model.",
53
54
#| fig.alt = "Comparing predicted prices for a linear model with interactions and a random forest model. The random forest results in more accurate predictions."
54
55
bind_rows(
@@ -59,7 +60,7 @@ bind_rows(
59
60
geom_abline(col = "gray50", lty = 2) +
60
61
geom_point(alpha = 0.3, show.legend = FALSE) +
61
62
facet_wrap(vars(model)) +
62
-
scale_color_viridis_d(end = 0.7) +
63
+
scale_color_brewer(palette = "Paired") +
63
64
labs(x = "true price", y = "predicted price")
64
65
```
65
66
@@ -427,7 +428,6 @@ Using our previously defined importance plotting function, `ggplot_imp(vip_beans
427
428
```{r bean-explainer}
428
429
#| echo = FALSE,
429
430
#| fig.width = 8,
430
-
#| fig.height = 4.5,
431
431
#| fig.cap = "Global explainer for the regularized discriminant analysis model on the beans data.",
432
432
#| fig.alt = "Global explainer for the regularized discriminant analysis model on the beans data. Almost all predictors have a significant contribution with shape factors one and four contributing the most. "
Copy file name to clipboardExpand all lines: 19-when-should-you-trust-predictions.Rmd
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -300,7 +300,7 @@ res_test %>%
300
300
scale_color_brewer(palette = "Set2") +
301
301
scale_shape_manual(values = 15:22) +
302
302
scale_x_date(labels = date_format("%B %d, %Y")) +
303
-
labs(x = NULL, y = "Daily Ridership (x1000)", color = NULL)
303
+
labs(x = NULL, y = "Daily Ridership (x1000)", color = NULL, pch = NULL)
304
304
```
305
305
306
306
Given the scale of the ridership numbers, these results look particularly good for such a simple model. If this model were deployed, how well would it have done a few years later in June of 2020? The model successfully makes a prediction, as a predictive model will when given input data:
@@ -342,7 +342,7 @@ res_2020 %>%
342
342
scale_shape_manual(values = 15:22) +
343
343
scale_color_brewer(palette = "Set2") +
344
344
scale_x_date(labels = date_format("%B %d, %Y")) +
345
-
labs(x = NULL, y = "Daily Ridership (x1000)", color = NULL)
345
+
labs(x = NULL, y = "Daily Ridership (x1000)", color = NULL, pch = NULL)
346
346
```
347
347
348
348
Confidence and prediction intervals for linear regression expand as the data become more and more removed from the center of the training set. However, that effect is not dramatic enough to flag these predictions as being poor.
0 commit comments