Skip to content

Commit f681d9f

Browse files
authored
Merge pull request #232 from tidymodels/more-fig-updates
More figure updates
2 parents 14ec16d + 08afdbc commit f681d9f

15 files changed

+71
-66
lines changed

01-software-modeling.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ plm_plot <-
8989
ames_plot <-
9090
ggplot(ames, aes(x = Latitude, y = Sale_Price)) +
9191
geom_point(alpha = .2) +
92-
geom_smooth(se = FALSE, method = stats::loess, method.args = list(span = .3), col = "red") +
92+
geom_smooth(se = FALSE, method = stats::loess, method.args = list(span = .3), color = "red") +
9393
scale_y_log10() +
9494
ylab("House Sale Price ($US)") +
9595
ggtitle("(b) Using a model-based smoother to discover trends.")
@@ -200,7 +200,7 @@ This iterative process is especially true for modeling. Figure \@ref(fig:softwar
200200

201201
* **Feature engineering:** The understanding gained from EDA results in the creation of specific model terms that make it easier to accurately model the observed data. This can include complex methodologies (e.g., PCA) or simpler features (using the ratio of two predictors). Chapter \@ref(recipes) focuses entirely on this important step.
202202

203-
* **Model tuning and selection (circles with blue and yellow segments):** A variety of models are generated and their performance is compared. Some models require _parameter tuning_ where some structural parameters are required to be specified or optimized. The colored segments within the circles signify the repeated data splitting used during resampling (see Chapter \@ref(resampling)).
203+
* **Model tuning and selection (circles with alternating segments):** A variety of models are generated and their performance is compared. Some models require _parameter tuning_ where some structural parameters are required to be specified or optimized. The colored segments within the circles signify the repeated data splitting used during resampling (see Chapter \@ref(resampling)).
204204

205205
* **Model evaluation:** During this phase of model development, we assess the model's performance metrics, examine residual plots, and conduct other EDA-like analyses to understand how well the models work. In some cases, formal between-model comparisons (Chapter \@ref(compare)) help you to understand whether any differences in models are within the experimental noise.
206206

03-base-r.Rmd

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,13 @@ names(crickets)
2323
2424
# Plot the temperature on the x-axis, the chirp rate on the y-axis. The plot
2525
# elements will be colored differently for each species:
26-
ggplot(crickets, aes(x = temp, y = rate, col = species)) +
26+
ggplot(crickets,
27+
aes(x = temp, y = rate, color = species, pch = species, lty = species)) +
2728
# Plot points for each data point and color by species
28-
geom_point() +
29+
geom_point(size = 2) +
2930
# Show a simple linear model fit created separately for each species:
30-
geom_smooth(method = lm, se = FALSE) +
31+
geom_smooth(method = lm, se = FALSE, alpha = 0.5) +
32+
scale_color_brewer(palette = "Paired") +
3133
labs(x = "Temperature (C)", y = "Chirp Rate (per minute)")
3234
```
3335

08-feature-engineering.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ After exploring the Ames training set, we might find that the regression slopes
255255
ggplot(ames_train, aes(x = Gr_Liv_Area, y = 10^Sale_Price)) +
256256
geom_point(alpha = .2) +
257257
facet_wrap(~ Bldg_Type) +
258-
geom_smooth(method = lm, formula = y ~ x, se = FALSE, col = "red") +
258+
geom_smooth(method = lm, formula = y ~ x, se = FALSE, color = "lightblue") +
259259
scale_x_log10() +
260260
scale_y_log10() +
261261
labs(x = "Gross Living Area", y = "Sale Price (USD)")
@@ -330,7 +330,7 @@ plot_smoother <- function(deg_free) {
330330
geom_smooth(
331331
method = lm,
332332
formula = y ~ ns(x, df = deg_free),
333-
col = "red",
333+
color = "lightblue",
334334
se = FALSE
335335
) +
336336
labs(title = paste(deg_free, "Spline Terms"),

10-resampling.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -468,7 +468,7 @@ Since this analysis used 10-fold cross-validation, there is one unique predictio
468468
assess_res %>%
469469
ggplot(aes(x = Sale_Price, y = .pred)) +
470470
geom_point(alpha = .15) +
471-
geom_abline(col = "red") +
471+
geom_abline(color = "red") +
472472
coord_obs_pred() +
473473
ylab("Predicted")
474474
```

11-comparing-models.Rmd

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ These high correlations indicate that, across models, there are large within-res
136136
```{r compare-rsq-plot, eval=FALSE}
137137
rsq_indiv_estimates %>%
138138
mutate(wflow_id = reorder(wflow_id, .estimate)) %>%
139-
ggplot(aes(x = wflow_id, y = .estimate, group = id, col = id)) +
139+
ggplot(aes(x = wflow_id, y = .estimate, group = id, color = id)) +
140140
geom_line(alpha = .5, lwd = 1.25) +
141141
theme(legend.position = "none")
142142
```
@@ -151,7 +151,7 @@ y_lab <- expression(R^2 ~ statistics)
151151
152152
rsq_indiv_estimates %>%
153153
mutate(wflow_id = reorder(wflow_id, .estimate)) %>%
154-
ggplot(aes(x = wflow_id, y = .estimate, group = id, col = id)) +
154+
ggplot(aes(x = wflow_id, y = .estimate, group = id, color = id)) +
155155
geom_line(alpha = .5, lwd = 1.25) +
156156
theme(legend.position = "none") +
157157
labs(x = NULL, y = y_lab)
@@ -350,7 +350,7 @@ The four posterior distributions are visualized in Figure \@ref(fig:four-posteri
350350
model_post %>%
351351
mutate(model = forcats::fct_inorder(model)) %>%
352352
ggplot(aes(x = posterior)) +
353-
geom_histogram(bins = 50, col = "white", fill = "blue", alpha = 0.4) +
353+
geom_histogram(bins = 50, color = "white", fill = "blue", alpha = 0.4) +
354354
facet_wrap(~ model, ncol = 1)
355355
```
356356

@@ -364,8 +364,8 @@ x_lab <- expression(Posterior ~ "for" ~ mean ~ R^2)
364364
model_post %>%
365365
mutate(model = forcats::fct_inorder(model)) %>%
366366
ggplot(aes(x = posterior)) +
367-
geom_histogram(bins = 50, col = "white", fill = "blue", alpha = 0.4) +
368-
facet_wrap(~ model, ncol = 1) +
367+
geom_histogram(bins = 50, color = "white", fill = "blue", alpha = 0.4) +
368+
facet_wrap(~ model, ncolor = 1) +
369369
labs(x = x_lab)
370370
```
371371

@@ -398,7 +398,7 @@ rqs_diff %>%
398398
as_tibble() %>%
399399
ggplot(aes(x = difference)) +
400400
geom_vline(xintercept = 0, lty = 2) +
401-
geom_histogram(bins = 50, col = "white", fill = "red", alpha = 0.4)
401+
geom_histogram(bins = 50, color = "white", fill = "red", alpha = 0.4)
402402
```
403403

404404
```{r posterior-difference}
@@ -419,7 +419,7 @@ rqs_diff %>%
419419
as_tibble() %>%
420420
ggplot(aes(x = difference)) +
421421
geom_vline(xintercept = 0, lty = 2) +
422-
geom_histogram(bins = 50, col = "white", fill = "red", alpha = 0.4) +
422+
geom_histogram(bins = 50, color = "white", fill = "red", alpha = 0.4) +
423423
labs(x = x_lab)
424424
```
425425

12-tuning-parameters.Rmd

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -112,10 +112,10 @@ To demonstrate, consider the classification data shown in Figure \@ref(fig:two-c
112112
#| echo = FALSE,
113113
#| fig.cap = "An example two-class classification data set with two predictors.",
114114
#| fig.alt = "An example two-class classification data set with two predictors. The two predictors have a moderate correlation and there is some locations of separation between the classes."
115-
ggplot(training_set, aes(x = A, y = B, col = Class)) +
116-
geom_point(alpha = .5) +
115+
ggplot(training_set, aes(x = A, y = B, color = Class, pch = Class)) +
116+
geom_point(alpha = 0.7) +
117117
coord_equal() +
118-
labs(x = "Predictor A", y = "Predictor B", col = NULL) +
118+
labs(x = "Predictor A", y = "Predictor B", color = NULL, pch = NULL) +
119119
scale_color_manual(values = c("#CC6677", "#88CCEE"))
120120
```
121121

@@ -266,9 +266,9 @@ link_grids <-
266266
267267
link_grids %>%
268268
ggplot(aes(x = A, y = B)) +
269-
geom_point(data = testing_set, aes(col = Class, pch = Class),
270-
alpha = .5, show.legend = FALSE) +
271-
geom_contour(aes( z = .pred_Class1, lty = link), breaks = 0.5, col = "black") +
269+
geom_point(data = testing_set, aes(color = Class, pch = Class),
270+
alpha = 0.7, show.legend = FALSE) +
271+
geom_contour(aes( z = .pred_Class1, lty = link), breaks = 0.5, color = "black") +
272272
scale_color_manual(values = c("#CC6677", "#88CCEE")) +
273273
coord_equal() +
274274
labs(x = "Predictor A", y = "Predictor B")
@@ -352,9 +352,9 @@ te_plot <-
352352
label = ifelse(label == " 1 units", " 1 unit", label)
353353
) %>%
354354
ggplot(aes(x = A, y = B)) +
355-
geom_point(data = testing_set, aes(col = Class, pch = Class),
356-
alpha = .5, show.legend = FALSE) +
357-
geom_contour(aes( z = .pred_Class1), breaks = 0.5, col = "black") +
355+
geom_point(data = testing_set, aes(color = Class, pch = Class),
356+
alpha = 0.5, show.legend = FALSE) +
357+
geom_contour(aes( z = .pred_Class1), breaks = 0.5, color = "black") +
358358
scale_color_manual(values = c("#CC6677", "#88CCEE")) +
359359
facet_wrap(~ label, nrow = 1) +
360360
coord_equal() +
@@ -374,9 +374,9 @@ tr_plot <-
374374
label = ifelse(label == " 1 units", " 1 unit", label)
375375
) %>%
376376
ggplot(aes(x = A, y = B)) +
377-
geom_point(data = training_set, aes(col = Class, pch = Class),
378-
alpha = .5, show.legend = FALSE) +
379-
geom_contour(aes( z = .pred_Class1), breaks = 0.5, col = "black") +
377+
geom_point(data = training_set, aes(color = Class, pch = Class),
378+
alpha = 0.5, show.legend = FALSE) +
379+
geom_contour(aes( z = .pred_Class1), breaks = 0.5, color = "black") +
380380
scale_color_manual(values = c("#CC6677", "#88CCEE")) +
381381
facet_wrap(~ label, nrow = 1) +
382382
coord_equal() +

13-grid-search.Rmd

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -501,7 +501,7 @@ load("extras/parallel_times/resamples_times.RData")
501501
resamples_times %>%
502502
dplyr::rename(operation = label) %>%
503503
ggplot(aes(y = id_alt, x = duration, fill = operation)) +
504-
geom_bar(stat = "identity", col = "black") +
504+
geom_bar(stat = "identity", color = "black") +
505505
labs(y = NULL, x = "Elapsed Time") +
506506
scale_fill_brewer(palette = "Paired") +
507507
theme(legend.position = "top")
@@ -615,7 +615,7 @@ start_stop_dat %>%
615615
ymax = id_stop,
616616
fill = operation
617617
),
618-
col = "black"
618+
color = "black"
619619
) +
620620
facet_wrap(~ pid, nrow = 2) +
621621
labs(y = NULL, x = "Elapsed Time") +
@@ -653,8 +653,8 @@ First, let's consider the raw execution times in Figure \@ref(fig:parallel-times
653653
#| fig.alt = "Execution times for model tuning versus the number of workers using different delegation schemes. The diagonal black line indicates a linear speedup where the addition of a new worker process has maximal effect. The 'everything' scheme shows that the benefits decrease after three or four workers, especially when there is expensive preprocessing. The 'resamples' scheme has almost linear speedups across all tasks."
654654
655655
load("extras/parallel_times/xgb_times.RData")
656-
ggplot(times, aes(x = num_cores, y = elapsed, col = parallel_over, shape = parallel_over)) +
657-
geom_point() +
656+
ggplot(times, aes(x = num_cores, y = elapsed, color = parallel_over, shape = parallel_over)) +
657+
geom_point(size = 2) +
658658
geom_line() +
659659
facet_wrap(~ preprocessing) +
660660
labs(x = "Number of Workers", y = "Execution Time (s)") +
@@ -684,9 +684,9 @@ We can also view these data in terms of speed-ups in Figure \@ref(fig:parallel-s
684684
#| fig.cap = "Speed-ups for model tuning versus the number of workers using different delegation schemes.",
685685
#| fig.alt = "Speed-ups for model tuning versus the number of workers using different delegation schemes."
686686
687-
ggplot(times, aes(x = num_cores, y = speed_up, col = parallel_over, shape = parallel_over)) +
687+
ggplot(times, aes(x = num_cores, y = speed_up, color = parallel_over, shape = parallel_over)) +
688688
geom_abline(lty = 1) +
689-
geom_point() +
689+
geom_point(size = 2) +
690690
geom_line() +
691691
facet_wrap(~ preprocessing) +
692692
coord_obs_pred() +
@@ -778,9 +778,9 @@ iter_three <- race_details %>% dplyr::filter(iter == 3)
778778
779779
iter_three %>%
780780
ggplot(aes(x = -estimate, y = .config)) +
781-
geom_vline(xintercept = 0, lty = 2, col = "green") +
782-
geom_point(size = 2, aes(col = decision)) +
783-
geom_errorbarh(aes(xmin = -estimate, xmax = -upper, col = decision), height = .3, show.legend = FALSE) +
781+
geom_vline(xintercept = 0, lty = 2, color = "green") +
782+
geom_point(size = 2, aes(color = decision)) +
783+
geom_errorbarh(aes(xmin = -estimate, xmax = -upper, color = decision), height = .3, show.legend = FALSE) +
784784
labs(x = "Loss of ROC AUC", y = NULL) +
785785
scale_colour_manual(values = race_cols)
786786
```
@@ -801,8 +801,8 @@ race_ci_plots <- function(x, iters = max(x$iter)) {
801801
p <-
802802
x %>%
803803
dplyr::filter(iter == i) %>%
804-
ggplot(aes(x = -estimate, y = .config, col = decision)) +
805-
geom_vline(xintercept = 0, col = "green", lty = 2) +
804+
ggplot(aes(x = -estimate, y = .config, color = decision)) +
805+
geom_vline(xintercept = 0, color = "green", lty = 2) +
806806
geom_point(size = 2) +
807807
labs(title = ttl, y = "", x = "Loss of ROC AUC") +
808808
scale_color_manual(values = c(best = "blue", retain = "black", discard = "grey"),

14-iterative-search.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,7 @@ To demonstrate, let's look at a toy example with a single parameter that has val
241241
y_lab <- expression(Estimated ~ R^2)
242242
243243
ggplot(grid, aes(x = x, y = y)) +
244-
geom_line(col = "red", alpha = .5, lwd = 1.25) +
244+
geom_line(color = "red", alpha = .5, lwd = 1.25) +
245245
labs(y = y_lab, x = "Tuning Parameter") +
246246
geom_point(data = current_iter)
247247
```
@@ -317,7 +317,7 @@ small_pred %>%
317317
group_by(value) %>%
318318
do(get_density(.)) %>%
319319
ungroup() %>%
320-
ggplot(aes(x = x, y = density, col = `Parameter Value`, lty = `Parameter Value`)) +
320+
ggplot(aes(x = x, y = density, color = `Parameter Value`, lty = `Parameter Value`)) +
321321
geom_line() +
322322
geom_vline(xintercept = max(current_iter$y), lty = 3) +
323323
labs(x = x_lab) +
@@ -736,7 +736,7 @@ The process starts with initial values of `penalty = 0.025` and `mixture = 0.050
736736
#| fig.alt = "An illustration of how simulated annealing determines what is the local neighborhood for two numeric tuning parameters. The clouds of points show possible next values where one would be selected at random. The candidate points are small circular clouds surrounding the current best point."
737737
738738
ggplot(neighbors_values, aes(x = penalty, y = mixture)) +
739-
geom_point(alpha = .3, size = 3/4, aes(col = factor(Iteration)), show.legend = FALSE) +
739+
geom_point(alpha = .3, size = 3/4, aes(color = factor(Iteration)), show.legend = FALSE) +
740740
scale_x_continuous(trans = "log10", limits = pen_rng) +
741741
scale_y_continuous(limits = mix_rng) +
742742
geom_point(data = best_values) +

15-workflow-sets.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -483,7 +483,7 @@ collect_metrics(boosting_test_results)
483483
boosting_test_results %>%
484484
collect_predictions() %>%
485485
ggplot(aes(x = compressive_strength, y = .pred)) +
486-
geom_abline(col = "green", lty = 2) +
486+
geom_abline(color = "gray50", lty = 2) +
487487
geom_point(alpha = 0.5) +
488488
coord_obs_pred() +
489489
labs(x = "observed", y = "predicted")

16-dimensionality-reduction.Rmd

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -231,13 +231,13 @@ library(patchwork)
231231
p1 <-
232232
bean_validation %>%
233233
ggplot(aes(x = area)) +
234-
geom_histogram(bins = 30, col = "white", fill = "blue", alpha = 1/3) +
234+
geom_histogram(bins = 30, color = "white", fill = "blue", alpha = 1/3) +
235235
ggtitle("Original validation set data")
236236
237237
p2 <-
238238
bean_val_processed %>%
239239
ggplot(aes(x = area)) +
240-
geom_histogram(bins = 30, col = "white", fill = "red", alpha = 1/3) +
240+
geom_histogram(bins = 30, color = "white", fill = "red", alpha = 1/3) +
241241
ggtitle("Processed validation set data")
242242
243243
p1 + p2
@@ -278,7 +278,7 @@ plot_validation_results <- function(recipe, dat = assessment(bean_val$splits[[1]
278278
# Process the data (the validation set by default)
279279
bake(new_data = dat) %>%
280280
# Create the scatterplot matrix
281-
ggplot(aes(x = .panel_x, y = .panel_y, col = class, fill = class)) +
281+
ggplot(aes(x = .panel_x, y = .panel_y, color = class, fill = class)) +
282282
geom_point(alpha = 0.4, size = 0.5) +
283283
geom_autodensity(alpha = .3) +
284284
facet_matrix(vars(-class), layer.diag = 2) +
@@ -319,6 +319,7 @@ bean_rec_trained %>%
319319
step_pca(all_numeric_predictors(), num_comp = 4) %>%
320320
prep() %>%
321321
plot_top_loadings(component_number <= 4, n = 5) +
322+
scale_fill_brewer(palette = "Paired") +
322323
ggtitle("Principal Component Analysis")
323324
```
324325

@@ -357,6 +358,7 @@ bean_rec_trained %>%
357358
step_pls(all_numeric_predictors(), outcome = "class", num_comp = 4) %>%
358359
prep() %>%
359360
plot_top_loadings(component_number <= 4, n = 5, type = "pls") +
361+
scale_fill_brewer(palette = "Paired") +
360362
ggtitle("Partial Least Squares")
361363
```
362364

@@ -530,7 +532,7 @@ Figure \@ref(fig:dimensionality-rankings) illustrates this ranking.
530532
#| fig.alt = "Area under the ROC curve from the validation set. The three best model configurations use PLS together with regularized discriminant analysis, a multi-layer perceptron, and a naive Bayes model."
531533
532534
rankings %>%
533-
ggplot(aes(x = rank, y = mean, pch = method, col = model)) +
535+
ggplot(aes(x = rank, y = mean, pch = method, color = model)) +
534536
geom_point(cex = 3.5) +
535537
theme(legend.position = "right") +
536538
labs(y = "ROC AUC") +

0 commit comments

Comments
 (0)