-
Notifications
You must be signed in to change notification settings - Fork 41
/
09-layers.Rmd
488 lines (332 loc) · 16 KB
/
09-layers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
# (PART\*) Visualize {-}
# Layers
**Learning objectives**
- We are going to learn about the layered grammar of graphics, including:
- using aesthetics and geometries to build plots;
- using facets for splitting the plot into subsets;
- using statistics for understanding how geoms are calculated;
- making position adjustments when geoms might otherwise overlap; and
- how coordinate systems allow us to fundamentally change what x and y mean.
```{r include=FALSE}
library(tidyverse)
```
## Introduction
![](images/ge_all.png)
## Aesthetic mappings
- We will be working with the `mpg` data frame that is bundled with the ggplot2 package which contains 234 observations collected by the US Environmental Protection Agency on 38 car models. Among the variables in `mpg` are:
-`displ`: A car’s engine size, in liters. A numerical variable.
-`hwy`: A car’s fuel efficiency on the highway, in miles per gallon (mpg). A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance. A numerical variable.
-`class`: Type of car. A categorical variable.
```{r include=FALSE}
mpg
```
## Mapping categorical variables to aesthetics
- Let’s start by visualizing the relationship between `displ` and `hwy` for various `class`es of cars.
- By default ggplot2 will only use six shapes at a time so additional groups will go unplotted when you use the shape aesthetic. There are 62 SUVs in the dataset and they’re not plotted.
```{r, figures-side1, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
# Left
ggplot(mpg, aes(x = displ, y = hwy, color = class))+
geom_point()
# Right
ggplot(mpg, aes(x = displ, y = hwy, shape = class))+
geom_point()
```
## Mapping categorical variables to aesthetics cont.
- Similarly, we can map `class` to `size` or `alpha` (transparency) aesthetics as well.
- We get warnings because mapping a non-ordinal discrete variable (`class`) to an ordered aesthetic (`size` or `alpha`) is generally not a good idea because it implies a ranking that does not in fact exist.
```{r, figures-side, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
# Left
ggplot(mpg, aes(x = displ, y = hwy, size = class))+
geom_point()
# Right
ggplot(mpg, aes(x = displ, y = hwy, alpha = class))+
geom_point()
```
##
- Once you map an aesthetic, ggplot2 takes care of the rest by
- selecting a reasonable scale to use with the aesthetic,
- and it constructing a legend that explains the mapping between levels and values.
- For x and y aesthetics, ggplot2 does not create a legend, but it creates an axis line with tick marks and a label.
- The axis line acts as a legend; it explains the mapping between locations and values.
## Manually setting aesthetic propoerties
- We can also set the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue.
- The color doesn't convey information about a variable, but only changes the appearance of the plot.
```{r}
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(color = "blue")
```
##
- When manually setting aesthetic propoerties, we need to pick a value that makes sense for that aesthetic:
- The name of a color as a character string, e.g., color = "blue"
- The size of a point in mm, e.g., size = 1
- The shape of a point as a number, e.g, shape = 1, as shown below.
- We can learn more about aesthetics mapping by looking at this vignettes [**aesthetic specifications vignette**](https://ggplot2.tidyverse.org/articles/ggplot2-specs.html)
![](images/shape.png)
## 10.2 Exercises
### 10.2.1.1
Create a scatterplot of hwy vs. displ where the points are pink filled in triangles.
```{r}
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(color = "pink", shape= 17)
```
### 10.2.1.4
```{r}
ggplot(mpg, aes(x = displ, y = hwy, color = displ < 5)) +
geom_point()
```
## Geometric objects
- The plots below contain the same x variable, the same y variable, and both describe the same data, but are not identical because each plot uses a different geometric object, geom, to represent the data.
```{r, figures-side3, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
# Left
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
# Right
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_smooth()
```
## Geometric objects cont.
- Not every aesthetic works with every geom.
- For example, you could set the shape of a point, but you couldn’t set the “shape” of a line. If you try, ggplot2 will silently ignore that aesthetic mapping.
- On the other hand, you could set the linetype of a line. Here we see, `geom_smooth()` will draw a different line, with a different linetype, for each unique value of the variable that you map to linetype.
```{r, figures-side4, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
# Left
ggplot(mpg, aes(x = displ, y = hwy, shape = drv))+
geom_smooth()
# Right
ggplot(mpg, aes(x = displ, y = hwy, linetype = drv))+
geom_smooth()
```
##
```{r, figures-side5, fig.show="hold", out.width="50%"}
#par(mar = c(4, 4, .1, .1))
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_smooth()
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_smooth(aes(group = drv))
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_smooth(aes(color = drv), show.legend = FALSE)+geom_smooth()
```
##
- We can also specify different data for different layer. Here, we use red points as well as open circles to highlight two-seater cars. The local data argument in `geom_smooth()` overrides the global data argument in `ggplot()` for that layer only.
```{r}
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_point(
data = mpg |> filter(class == "2seater"),
color = "red"
) +
geom_point(
data = mpg |> filter(class == "2seater"),
shape = "circle open", size = 3, color = "red"
)
```
- The histogram and density plot below reveal that the distribution of highway mileage is bimodal and right skewed while the boxplot reveals two potential outliers.
```{r, figures-side8, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
# Left
ggplot(mpg, aes(x = hwy)) +
geom_histogram(binwidth = 2)
# Middle
ggplot(mpg, aes(x = hwy)) +
geom_density()
# Right
ggplot(mpg, aes(x = hwy)) +
geom_boxplot()
```
ggplot2 provides more than 40 geoms but these don’t cover all possible plots one could make. If you need a different geom, we recommend looking into extension packages first to see if someone else has already implemented it [**here**](https://exts.ggplot2.tidyverse.org/gallery/)
The best place to get a comprehensive overview of all of the geoms ggplot2 offers, as well as all functions in the package, is the reference page:[**ggplot2-reference page**](https://ggplot2.tidyverse.org/reference)
## 10.3 Exercises
## Facets
- Facets are used to splits a plot into subplots that each display one subset of the data based on a categorical variable.
- To facet your plot with the combination of two variables, switch from `facet_wrap()`, which we learned about in chapter 2, to `facet_grid()`, which uses a double sided formula, `rows~cols`.
```{r, figures-side7, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
# Left
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~cyl)
# Right
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(drv ~ cyl)
```
## 10.4 Exercises
## Statistical transformations
```{r}
ggplot(diamonds, aes(x = cut)) +
geom_bar()
```
![](images/visualization-stat-bar.png)
- We can learn which stat a geom uses by inspecting the default value for the stat argument. For example, ?geom_bar shows that the default value for stat is “count”, which means that geom_bar() uses stat_count(). stat_count() is documented on the same page as geom_bar().
- We might want to override the default mapping from transformed variables to aesthetics. For example, you might want to display a bar chart of proportions, rather than counts:
```{r}
ggplot(diamonds, aes(x = cut, y = after_stat(prop), group = 1))+
geom_bar()
```
- We might want to draw greater attention to the statistical transformation in your code. For example, you might use stat_summary(), which summarizes the y values for each unique x value, to draw attention to the summary that you’re computing:
```{r}
ggplot(diamonds) +
stat_summary(
aes(x = cut, y = depth),
fun.min = min,
fun.max = max,
fun = median
)
```
- **ggplot2 provides more than 20 stats for you to use. Each stat is a function, so you can get help in the usual way, e.g. ?stat_bin.**
## 10.5 Exercises
## Position adjustments
```{r, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
ggplot(diamonds, aes(x = cut, color = cut)) +
geom_bar()
ggplot(diamonds, aes(x = cut, fill = cut)) +
geom_bar()
```
```{r}
ggplot(diamonds, aes(x = cut, fill = clarity)) +
geom_bar()
```
- The stacking is performed automatically using the position adjustment specified by the position argument. If you don’t want a stacked bar chart, you can use one of three other options: **"identity"**, **"dodge"** or **"fill"**.
- N/B: position = "identity" will place each object exactly where it falls in the context of the graph. This is not very useful for bars, because it overlaps them. To see that overlapping we either need to make the bars slightly transparent by setting alpha to a small value, or completely transparent by setting fill = NA.
```{r, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
ggplot(diamonds, aes(x = cut, fill = clarity))+
geom_bar(alpha = 1/5, position = "identity")
ggplot(diamonds, aes(x = cut, color = clarity))+
geom_bar(fill = NA, position = "identity")
```
- Avoiding over-plotting
```{r}
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(position = "jitter")
```
## 10.6 Exercises
## Coordinate systems
Coordinate systems are probably the most complicated part of ggplot2. The default coordinate system is the Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are two other coordinate systems that are occasionally helpful.
![](images/visualization-coordinate-systems.png)
- coord_quickmap() sets the aspect ratio correctly for maps. This is very important if you’re plotting spatial data with ggplot2. We don’t have the space to discuss maps in this book, but you can learn more in the [Maps chapter](https://ggplot2-book.org/maps.html) of ggplot2: Elegant graphics for data analysis.
```{r, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
nz <- map_data("nz")
ggplot(nz, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", color = "black")
ggplot(nz, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", color = "black") +
coord_quickmap()
```
```{r, fig.show="hold", out.width="50%"}
par(mar = c(4, 4, .1, .1))
bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar + coord_flip()
bar + coord_polar()
```
## 10.7 Exercises
## Resources
- Two very useful resources for getting an overview of the complete ggplot2 functionality are the [ggplot2 cheatsheet](https://posit.co/resources/cheatsheets) and the [ggplot2 package website](https://ggplot2.tidyverse.org).
- [ggplot2 Extension Gallery](https://exts.ggplot2.tidyverse.org/gallery/)
- [R Graph Gallery](https://www.r-graph-gallery.com/ggplot2-package.html)
- The [Graphs section](http://www.cookbook-r.com/Graphs/) of the R Cookbook
- [dslc.io/join](dslc.io/join) for more book clubs!
## Meeting Videos
### Cohort 5
`r knitr::include_url("https://www.youtube.com/embed/ujOn-4esnDo")`
<details>
<summary> Meeting chat log </summary>
```
00:13:43 Njoki Njuki Lucy: Is it best to visualize the variation in a categorical variable with only two levels using a bar chart? If not, what's the chart to use if I may ask?
00:16:00 Ryan Metcalf: Great question Njoki, Categorical, by definition is a set that a variable can have. Say, Male / Female / Other. This example indicates a variable can have three states. It depends on your data set.
00:16:51 Eileen: bar or pie chart?
00:16:51 Ryan Metcalf: There are other forms of presentation other than a bar chart. I.E “quantifying” each category.
00:18:37 Eileen: box chart
00:18:46 Njoki Njuki Lucy: thank you so much everyone :)
00:24:31 lucus w: This website is excellent in determining geom to use: www.data-to-viz.com
00:25:22 Njoki Njuki Lucy: awesome, thanks
00:25:44 Eileen: Box charts are great for showing outliers
00:26:31 Federica Gazzelloni: other interesting resources:
00:26:34 Federica Gazzelloni: https://www.r-graph-gallery.com/ggplot2-package.html
00:26:51 Federica Gazzelloni: http://www.cookbook-r.com/Graphs/
00:34:19 Amitrajit: what is the difference in putting aes() inside geom_count() rather than main ggplot() call?
00:35:38 Ryan Metcalf: Like maybe Supply vs Demand curves?
00:41:16 Federica Gazzelloni: what about the factor() that we add to a variable when we apply a color?
00:42:33 Susie Neilson: I do aes your way Jon!
00:43:07 Federica Gazzelloni: and grouping inside the aes
00:49:27 Amitrajit: thanks!
00:49:32 Federica Gazzelloni: thanks
00:49:35 Njoki Njuki Lucy: thank you, bye
00:49:45 Eileen: Thank you!
```
</details>
### Cohort 6
`r knitr::include_url("https://www.youtube.com/embed/mYTD9DbM174")`
<details>
<summary> Meeting chat log </summary>
```
00:06:21 Matthew Efoli: good evening Daniel and Esmeralda
00:07:39 Matthew Efoli: hello everyone
00:08:08 Daniel Adereti: Hello Matthew!
00:08:44 Daniel Adereti: I guess we can start? so we can finish the 2 chapters as Exploratory Data Analysis is quite long and involved
00:09:04 Freya Watkins: Sounds good! Hi all :)
00:10:55 Freya Watkins: yes can see
00:23:14 Daniel Adereti: na > Not available
00:23:32 Maria Eleni Soilemezidi: rm = remove
00:25:49 Esmeralda Cruz: yes
00:26:29 Esmeralda Cruz: to remove the outliers maybe?
00:29:20 Adeyemi Olusola: No
00:29:22 Freya Watkins: we can't see it no
00:29:27 Maria Eleni Soilemezidi: no we can't see it!
00:29:38 Maria Eleni Soilemezidi: thank you! Yes
00:32:57 Daniel Adereti: Cedric's article is a nice one! Helpful to understand descriptive use case of different plot ideas
00:43:19 Daniel Adereti: we can do the exercises
00:43:27 Esmeralda Cruz: ok
00:43:28 Maria Eleni Soilemezidi: yes, sure!
00:45:20 Adeyemi Olusola: we can try reorder
00:45:28 Adeyemi Olusola: from the previous example
00:51:44 Maria Eleni Soilemezidi: that's a good idea
00:52:28 Daniel Adereti: Thanks!
00:52:42 Daniel Adereti: cut_in_color_graph <- diamonds %>%
group_by(color, cut) %>%
summarise(n = n()) %>%
mutate(proportion_cut_in_color = n/sum(n)) %>%
ggplot(aes(x = color, y = cut))+
geom_tile(aes(fill = proportion_cut_in_color))+
labs(fill = "proportion\ncut in color")
00:53:32 Esmeralda Cruz: 😮
00:53:47 Adeyemi Olusola: smiles
00:54:13 Adeyemi Olusola: but lets try reorder...I think we should be able to pull something from it, though not sure about the heatmap thingy
00:54:26 Adeyemi Olusola: on our own though*
01:05:38 Maria Eleni Soilemezidi: no worries! Thank you for the presentation, Matthew! :)
01:05:39 Freya Watkins: Thanks Matthew!
01:06:44 Maria Eleni Soilemezidi: bye everyone, see you next week!
```
</details>
### Cohort 7
`r knitr::include_url("https://www.youtube.com/embed/UW8cfioTAVc")`
<details>
<summary> Meeting chat log </summary>
```
00:10:31 Oluwafemi Oyedele: We will start the discussion in the next 5 minutes!!!
00:18:34 Tim Newby: Hi the audio is still bad at my end, can anyone else hear?
00:57:01 Oluwafemi Oyedele: https://exts.ggplot2.tidyverse.org/gallery/
01:19:03 Oluwafemi Oyedele: https://ggplot2-book.org/maps.html
01:23:07 Oluwafemi Oyedele: https://ggplot2.tidyverse.org/
01:24:34 Tim Newby: diamonds |>
group_by(cut) |>
mutate(y = median(depth), ymin = min(depth), ymax = max(depth)) |>
ggplot() +
geom_pointrange(aes(x = cut, y = y, ymin = ymin, ymax = ymax))
```
</details>
### Cohort 8
`r knitr::include_url("https://www.youtube.com/embed/cDO1JD_Qlkw")`