Skip to content

Commit a490d49

Browse files
committed
working on ch 4 revisions
1 parent 413534f commit a490d49

8 files changed

+111479
-174
lines changed

04-summary.qmd

+169-170
Large diffs are not rendered by default.

09-wrangle.qmd

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
## Walkthrough video {#sec-walkthrough-wrangle .unnumbered}
1010

11-
There is a walkthrough video of this chapter available via [Echo360.](https://echo360.org.uk/media/dc1e2869-a6c2-45d8-ab40-cb85cdb67f43/public) Please note that there may have been minor edits to the book since the video was recorded. Where there are differences, the book should always take precedence.
11+
There is a walkthrough video of this chapter available via [Echo360](https://echo360.org.uk/media/dc1e2869-a6c2-45d8-ab40-cb85cdb67f43/public). Please note that there may have been minor edits to the book since the video was recorded. Where there are differences, the book should always take precedence.
1212

1313
## Set-up {#sec-setup-wrangle}
1414

@@ -284,7 +284,7 @@ Note that `str_detect()` is case sensitive so it would not return values of "Hig
284284
`filter()` is incredibly powerful and can allow you to select very specific subsets of data. But, it is also quite dangerous because when you start combining multiple criteria and operators, it's very easy to accidentally specify something slightly different than what you intended. **Always check your output**. If you have a small dataset, then you can eyeball it to see if it looks right. With a larger dataset, you may wish to compute summary statistics or count the number of groups/observations in each variable to verify your filter is correct. There is no level of expertise in coding that can substitute knowing and checking your data.
285285
:::
286286

287-
### Arrange
287+
### Arrange #sec-arrange
288288

289289
You can sort your dataset using `arrange()`. You will find yourself needing to sort data in R much less than you do in Excel, since you don't need to have rows next to each other in order to, for example, calculate group means. But `arrange()` can be useful when preparing data for display in tables. `arrange()` works on character data where it will sort alphabetically, as well as numeric data where the default is ascending order (smallest to largest). Reverse the order using `desc()`.
290290

app-dates.qmd

+38-2
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,46 @@ library(tidyverse)
1010
library(lubridate)
1111
```
1212

13+
## Formats
14+
15+
While there is only one correct way to write date (The ISO 8601 format of "YYYY-MM-DD"), dates can be found in many formats. When you are reading a data file, you might need to specify the date format so it can be read properly. Date format specification uses abbreviations to represent the different ways people can write. the year, month, and day (as well as hours, minutes, and seconds). For example, the date `2023-01-03` is represented by the formatting string `"%Y-%m-%d`. The fastest way to find the list of formatting abbreviations is to look in the help for the function `col_date()`.
16+
17+
```{r, filename = "Run in the console"}
18+
?col_date
19+
```
20+
21+
22+
```{r}
23+
# create a table with some different date formats
24+
date_formats <- tibble(
25+
best = "2022-01-03",
26+
ok = "2022 January 3",
27+
bad = "January 3, 2022",
28+
terrible = "Mon is 3 22 1"
29+
)
30+
31+
# save it as a CSV file
32+
write_csv(date_formats, "data/date_formats.csv")
33+
34+
# read it in
35+
df <- read_csv("data/date_formats.csv")
36+
```
37+
38+
You can see that only the first column read as a date, and the rest read as characters. You can set the date format using the `col_types` argument and two helper functions, `cols()` and `col_date()`.
39+
40+
```{r}
41+
ct <- cols(ok = col_date("%Y %B %d"),
42+
bad = col_date("%B %d, %Y"),
43+
terrible = col_date("%a is %m %y %d"))
44+
45+
read_csv("data/date_formats.csv",
46+
col_types = ct)
47+
```
48+
49+
1350
## Parsing
1451

15-
Dates can be in many formats. The `ymd` functions can deal with almost all of them, regardless of the punctuation used in the format. All of the examples below produce a date in the standard format "2022-01-03".
52+
The `ymd` functions can deal with almost all date formats, regardless of the punctuation used in the format. All of the examples below produce a date in the standard format "2022-01-03".
1653

1754
```{r ymd, results='hide'}
1855
# year-month-day orders
@@ -45,7 +82,6 @@ The date/time functions can also take a timezone argument. If you don't specify
4582
ymd_hm("2022-01-03 18:05", tz = "GMT")
4683
```
4784

48-
4985
## Get Parts
5086

5187
You frequently need to extract parts of a date/time for plotting. The following functions extract specific parts of a date or datetime object. This is a godsend for those of us who never have a clue what week of the year it is today.

data/12.1_delivery.csv

+97,078
Large diffs are not rendered by default.

data/date_formats.csv

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
best,ok,bad,terrible
2+
2022-01-03,2022 January 3,"January 3, 2022",Mon is 3 22 1

data/rep_gho_mortality-metadata.pdf

-253 KB
Binary file not shown.

data/rep_gho_mortality.xlsx

-15.7 MB
Binary file not shown.

data/weekly_ae_activity_20240303.csv

+14,190
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)