-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathslides.Rmd
359 lines (237 loc) · 9.09 KB
/
slides.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
---
title: 'R Bootcamp day 1<br>What is R? A motivating example of data viz<br><img src="../fig/trojan-rlogo.svg" style="width:250px;">'
author:
- Sarah Piombo
- Natalia Zemlianskaia
- George G. Vega Yon
date: August 16th, 2021
output:
slidy_presentation:
footer: R Bootcamp (day1)
highlight: pygments
font_adjustment: -1
editor_options:
chunk_output_type: console
---
# Overview
1. What is R and Rstudio?
2. Getting help with R.
3. A live example with ggplot2.
# Part 1: What is R?
## First questions
### What is R?
<img src="https://www.r-project.org/logo/Rlogo.svg" width="150px" alt="R logo">
> R is a language and environment for statistical computing and graphics.
--- https://r-project.org
### What is RStudio?
<img src="https://rstudio.com/wp-content/uploads/2018/10/RStudio-Logo.svg" width="400px" alt="RStudio logo">
> RStudio is an integrated development environment (IDE) for R.
--- https://rstudio.org/products/rstudio
---
<figure>
<img src="moderndive-r-vs-rstudio.png" alt="motiondive R vs RStudio tweet">
<figcaption>A nice way to see R vs RStudio by [ModernDive](https://moderndive.com/) (original tweet [here](https://twitter.com/ModernDive/status/1171456164938141697))</figcaption>
</figure>
---
## R in the terminal
<img src="r-command-line.png" width="600px">
---
## R + RStudio
<img src="rstudio-now.png" width="900px">
##
Let's see a live view of RStudio!...
# Part 2: Hands on with ggplot2
All the code for this section can be downloaded [here](slides.R). The entire presentation (which contains the code) was generated using RMarkdown and can be downloaded from [here](slides.Rmd).
(you will learn more about RMarkdown in day 3!)
---
## Set-up: Loading R packages and Data
```{r set-up}
library(ggplot2)
data("diamonds")
```
- The line `library(ggplot2)` loaded the package `ggplot2`.
- The line `data("diamonds")` loaded the dataset `diamonds` from the ggplot package.
To get help regarding a function, we can use the `help("<FUNCTION NAME>")` command in R, for example, if we wanted to learn more about `library()`, we could just type
```r
help("library")
```
Or also equally valid
```r
?"library"
```
(let's checkout how does the help file looks like!)
---
## Questions A:
1. What other arguments does the function `data()` accepts?
2. What does the function `str` does?
---
## Looking at the Data
How does data look like in R? There are many ways to represent data in R. One of the most flexible (popular?) ways of doing is through `data frames` (in the case of ["base R"](https://cran.r-project.org), the core component of R) and `tibbles` (in the case of the [tidyverse](https://tidyverse.org)). Tibbles/data frames share the same structure:
- Data entries (individuals/genes/countries/etc.) are organized by row.
- Features are organized by columns.
---
For example, here is how R prints a `tibble` and a `data.frame`:
```{r head-of-diamonds-tibble, echo=FALSE}
head(diamonds)
```
And a data frame version of the same data:
```{r head-of-diamonds-dataframe, echo=FALSE}
head(as.data.frame(diamonds))
```
---
R has functions to query/ask how many rows and columns these objects have, we can use the `nrow` and `ncol` functions as follows:
```{r quick-data-look, results='hold'}
# How many rows and columns?
nrow(diamonds)
ncol(diamonds)
```
Now let's get our hands dirty and do some visualization!
---
## A Walk Through Example with ggplot2
The `ggplot2` R package is for sure the most popular way to build plots in R. Here we will be looking at a couple of examples using the `diamond` dataset that we just loaded.
```{r viz-0a, echo=FALSE, cache=TRUE}
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price, color = color)) +
facet_wrap(~clarity) +
labs(
title = "Price of Diamonds (by clarity)",
subtitle = "data from the ggplot2 R package",
x = "Weight of the diamond (carat)",
y = "Price in US dollars",
color = "Color from \n J (worst) to D (best)"
)
```
---
The overall structure of ggplot is as follows:
```
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
```
- The `ggplot()` function sets up the data that we will be using
- The `<GEOM_FUNCTION>()` actually tells what type of plot are we building (histogram, scatterplot, barplot, etc.)
- The `aes(<MAPPINGS>)` indicates how **features (columns)** of the data are to be included in the plot.
- The `+` sign at the end of the line binds things together (we can add many layers/components to a single plot!)
---
Let's see what happens if we run the following code?
```{r viz-0b}
ggplot(data = diamonds)
```
Nothing! Because we haven't told ggplot what we want to visualize. The function only knows that we would like to work with the `diamonds` dataset, but it has no idea of what to plot!
---
Let's try again using the following code
```{r viz-0c, eval = FALSE}
ggplot(data = diamonds) +
geom_point()
```
```
Error: geom_point requires the following missing aesthetics: x and y
Run `rlang::last_error()` to see where the error occurred.
```
Ups! We got an error, and the error says `"geom_point requires the following missing aesthetics: x and y"`, which means that we still need to give ggplot a bit more of information about what we would like to visualize. Saying that we want a scatter plot without indicating what are the variables is meaningless.
---
So let's try again one more time and see what we get!
```{r viz-1, cache=TRUE}
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price))
```
---
How does the color affect the price?
```{r viz-2, cache=TRUE}
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price, color = color))
```
---
Now, how about clarity of the diamond?
```{r viz-3, cache=TRUE}
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price, color = color)) +
facet_wrap(~clarity)
```
---
Finally, let's add some titles to make it look nicer
```{r viz-4, cache=TRUE}
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price, color = color)) +
facet_wrap(~clarity) +
labs(
title = "Price of Diamonds (by clarity)",
subtitle = "data from the ggplot2 R package",
x = "Weight of the diamond (carat)",
y = "Price in US dollars",
color = "Color from \n J (worst) to D (best)"
)
```
---
## What else can we do?
- As one of the most popular R packages, besides of the types of plots included in `ggplot2`, there are dozends of other R packages that extend ggplot2! https://exts.ggplot2.tidyverse.org/gallery/
---
### ggwordcloud
<img src="https://lepennec.github.io/ggwordcloud/reference/figures/README-unnamed-chunk-4-1.png" alt="gganimate" width="600px">
Get it from CRAN here: https://cran.r-project.org/package=ggwordcloud
---
### gganimate
<img src="https://gganimate.com/reference/figures/README-unnamed-chunk-4-1.gif" alt="gganimate" width="600px">
Get it from CRAN here: https://cran.r-project.org/package=gganimate
---
### ggridges
<img src="https://exts.ggplot2.tidyverse.org/gallery/images/ggridges.png" alt="gganimate" width="600px">
Get it from CRAN here: https://cran.r-project.org/package=ggridges
---
## Questions B
1. Reproduce the last plot but this time put `carat` in the `y` axis and `price` in the `x` axis.
2. Using the `"mpg"` dataset (which can be loaded using `data(mpg)`), draw a similar plot using the following mappings
`aes(x = displ, y = hwy, color = drv)`. Fill in the missing pieces to get the plot:
```
data(< DATA >)
ggplot(data = < DATA >) +
geom_point(mapping = < MAPPINGS >) +
labs(
title = "Fuel economy data",
subtitle = "(1999 - 2008)",
x = "Engine displacement (liters)",
y = "Highway MPG",
color = "Drive train"
)
```
---
## Question B 1: Solution
```{r viz-4-swapped, cache=TRUE}
ggplot(data = diamonds) +
geom_point(mapping = aes(x = price, y = carat, color = color)) +
facet_wrap(~clarity) +
labs(
title = "Price of Diamonds (by clarity)",
subtitle = "data from the ggplot2 R package",
y = "Weight of the diamond (carat)",
x = "Price in US dollars",
color = "Color from \n J (worst) to D (best)"
)
```
---
## Question B 2: Solution
```{r mpg-sol, eval=TRUE, echo=TRUE}
data(mpg)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv)) +
labs(
title = "Fuel economy data",
subtitle = "(1999 - 2008)",
x = "Engine displacement (liters)",
y = "Highway MPG",
color = "Drive train"
)
```
---
## Bonus: An example using Boxplots
```{r example-w-boxplot, cache = TRUE}
ggplot(data = diamonds) +
geom_boxplot(mapping = aes(x = clarity, y = price, fill = clarity))
```
---
## References
- "R for data science" (free online book) https://r4ds.had.co.nz/
- The R graph gallery https://www.r-graph-gallery.com/
- The `bookdown` website (tons of free books about R) https://bookdown.org/
- "R Markdown: The Definitive Guide" (free online book) https://bookdown.org/yihui/rmarkdown/
- "RStudio Premiers" (online interactive tutorials with R) https://rstudio.cloud/learn/primers
- RStudio Webinars https://rstudio.com/resources/webinars/