You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: 09-wrangle.qmd
+2-2
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@
8
8
9
9
## Walkthrough video {#sec-walkthrough-wrangle .unnumbered}
10
10
11
-
There is a walkthrough video of this chapter available via [Echo360.](https://echo360.org.uk/media/dc1e2869-a6c2-45d8-ab40-cb85cdb67f43/public) Please note that there may have been minor edits to the book since the video was recorded. Where there are differences, the book should always take precedence.
11
+
There is a walkthrough video of this chapter available via [Echo360](https://echo360.org.uk/media/dc1e2869-a6c2-45d8-ab40-cb85cdb67f43/public). Please note that there may have been minor edits to the book since the video was recorded. Where there are differences, the book should always take precedence.
12
12
13
13
## Set-up {#sec-setup-wrangle}
14
14
@@ -284,7 +284,7 @@ Note that `str_detect()` is case sensitive so it would not return values of "Hig
284
284
`filter()` is incredibly powerful and can allow you to select very specific subsets of data. But, it is also quite dangerous because when you start combining multiple criteria and operators, it's very easy to accidentally specify something slightly different than what you intended. **Always check your output**. If you have a small dataset, then you can eyeball it to see if it looks right. With a larger dataset, you may wish to compute summary statistics or count the number of groups/observations in each variable to verify your filter is correct. There is no level of expertise in coding that can substitute knowing and checking your data.
285
285
:::
286
286
287
-
### Arrange
287
+
### Arrange #sec-arrange
288
288
289
289
You can sort your dataset using `arrange()`. You will find yourself needing to sort data in R much less than you do in Excel, since you don't need to have rows next to each other in order to, for example, calculate group means. But `arrange()` can be useful when preparing data for display in tables. `arrange()` works on character data where it will sort alphabetically, as well as numeric data where the default is ascending order (smallest to largest). Reverse the order using `desc()`.
Copy file name to clipboardexpand all lines: app-dates.qmd
+38-2
Original file line number
Diff line number
Diff line change
@@ -10,9 +10,46 @@ library(tidyverse)
10
10
library(lubridate)
11
11
```
12
12
13
+
## Formats
14
+
15
+
While there is only one correct way to write date (The ISO 8601 format of "YYYY-MM-DD"), dates can be found in many formats. When you are reading a data file, you might need to specify the date format so it can be read properly. Date format specification uses abbreviations to represent the different ways people can write. the year, month, and day (as well as hours, minutes, and seconds). For example, the date `2023-01-03` is represented by the formatting string `"%Y-%m-%d`. The fastest way to find the list of formatting abbreviations is to look in the help for the function `col_date()`.
16
+
17
+
```{r, filename = "Run in the console"}
18
+
?col_date
19
+
```
20
+
21
+
22
+
```{r}
23
+
# create a table with some different date formats
24
+
date_formats <- tibble(
25
+
best = "2022-01-03",
26
+
ok = "2022 January 3",
27
+
bad = "January 3, 2022",
28
+
terrible = "Mon is 3 22 1"
29
+
)
30
+
31
+
# save it as a CSV file
32
+
write_csv(date_formats, "data/date_formats.csv")
33
+
34
+
# read it in
35
+
df <- read_csv("data/date_formats.csv")
36
+
```
37
+
38
+
You can see that only the first column read as a date, and the rest read as characters. You can set the date format using the `col_types` argument and two helper functions, `cols()` and `col_date()`.
39
+
40
+
```{r}
41
+
ct <- cols(ok = col_date("%Y %B %d"),
42
+
bad = col_date("%B %d, %Y"),
43
+
terrible = col_date("%a is %m %y %d"))
44
+
45
+
read_csv("data/date_formats.csv",
46
+
col_types = ct)
47
+
```
48
+
49
+
13
50
## Parsing
14
51
15
-
Dates can be in many formats. The `ymd` functions can deal with almost all of them, regardless of the punctuation used in the format. All of the examples below produce a date in the standard format "2022-01-03".
52
+
The `ymd` functions can deal with almost all date formats, regardless of the punctuation used in the format. All of the examples below produce a date in the standard format "2022-01-03".
16
53
17
54
```{r ymd, results='hide'}
18
55
# year-month-day orders
@@ -45,7 +82,6 @@ The date/time functions can also take a timezone argument. If you don't specify
45
82
ymd_hm("2022-01-03 18:05", tz = "GMT")
46
83
```
47
84
48
-
49
85
## Get Parts
50
86
51
87
You frequently need to extract parts of a date/time for plotting. The following functions extract specific parts of a date or datetime object. This is a godsend for those of us who never have a clue what week of the year it is today.
0 commit comments