-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathHumGen_Lab3_trans.qmd
111 lines (75 loc) · 2.92 KB
/
HumGen_Lab3_trans.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: 'Lab 3 : Data Transformation with dplyr'
---
## Learning objectives
- Data Transformation using dplyr
## Load libaries
```{r}
library(tidyverse)
library(nycflights13)
```
## Introduction to Data Transformation
### Tables
How they are displayed in your qmd file is different from how they are rendered into a html, pdf and other files.
### Pipes
In the last few years `|>` pipe was introduced as a simpler alternative to the `%>%` pipe that has been used in R and Tidyverse for the last 10 years. In many online examples you will see the `%>%` used. For many uses in this class they are interchangeable.
Ctrl/Cmd + Shift + M.
Ctrl + Alt + I
### Checking each line of codes are you write it
Today we will see in Chapter 4 the following code chunk
```{r}
flights |>
filter(dest == "IAH") |>
group_by(year, month) |>
summarize(
arr_delay = mean(arr_delay, na.rm = TRUE)
)
```
If I was writing the code I would check each line as a wrote it to make sure I was getting the right result and to simplify trouble shooting error messages
```{r}
flights |>
filter(dest == "IAH")
```
```{r}
flights |>
filter(dest == "IAH") |>
group_by(year, month)
```
```{r}
flights |>
filter(dest == "IAH") |>
group_by(year, month) |>
summarize(
arr_delay = mean(arr_delay, na.rm = TRUE)
)
```
### Assignment
In the first lab with went over assignment of a number or a character sting to a variable
x <- 2
We can assign this to a new variable `IAH_arr_delay_by_month`
```{r}
IAH_arr_delay_by_month <- flights |>
filter(dest == "IAH") |>
group_by(year, month) |>
summarize(
arr_delay = mean(arr_delay, na.rm = TRUE)
)
```
Notice that nothing prints out. The new table is put in the data object `IAH_arr_delay_by_month`. Now you could use this object repeatedly in your code without running the larger code chunck above each time. You can view `IAH_arr_delay_by_month` by using view(IAH_arr_delay_by_month) or clicking on the object in the `Environment` window.
### Writing pseudo code
Was there a flight on every month of 2013?
Before writing any code it is best to break this down into the tasks we need to accomplish
1. filter flight data set to the year 2013
2. show only 1 row for each month
3. display table to see if each month is present or count to see if rows equal 12
This is actually the hard part of solving a coding challenge. Writing the codes is relatively easy when you know the steps
```{r}
flights |>
filter(year == 2013) |>
distinct(month)
```
## Exercises
[R for Data Science Chapter 3](https://r4ds.hadley.nz/data-transform).
Today we will walk through Chapter 3 Data Transformation in R for Data Science. As we did last week, by putting the examples and exercises in our own Quarto Markdown file, we can create own personal path through the Chapter.
### What to upload to Canvas
After you `Render` the qmd file to an html file, export the file to your computer and upload it to Canvas.