data/diffs/R-ecology-lesson--03-dplyr.diff

diff --git a/home/zhian/Documents/Carpentries/Git/zkamvar--postmaul/data/rmd-repos/datacarpentry--R-ecology-lessonR3/_site/03-dplyr.md b/home/zhian/Documents/Carpentries/Git/zkamvar--postmaul/data/rmd-repos/datacarpentry--R-ecology-lessonR4/_site/03-dplyr.md
index ca77bab..8ac831f 100644
--- a/home/zhian/Documents/Carpentries/Git/zkamvar--postmaul/data/rmd-repos/datacarpentry--R-ecology-lessonR3/_site/03-dplyr.md
+++ b/home/zhian/Documents/Carpentries/Git/zkamvar--postmaul/data/rmd-repos/datacarpentry--R-ecology-lessonR4/_site/03-dplyr.md
@@ -485,13 +485,15 @@ surveys %>%
 ```
 :::
 
+    #> `summarise()` ungrouping output (override with `.groups` argument)
+
 You may also have noticed that the output from these calls doesn't run
 off the screen anymore. It's one of the advantages of `tbl_df` over data
 frame.
 
 You can also group by multiple columns:
 
-::: {#cb20 .sourceCode}
+::: {#cb21 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
   group_by(sex, species_id) %>%
@@ -500,6 +502,8 @@ surveys %>%
 ```
 :::
 
+    #> `summarise()` regrouping output by 'sex' (override with `.groups` argument)
+
 Here, we used `tail()` to look at the last six rows of our summary.
 Before, we had used `head()` to look at the first six rows. We can see
 that the `sex` column contains `NA` values because some animals had
@@ -511,7 +515,7 @@ this, we can remove the missing values for weight before we attempt to
 calculate the summary statistics on weight. Because the missing values
 are removed first, we can omit `na.rm = TRUE` when computing the mean:
 
-::: {#cb21 .sourceCode}
+::: {#cb23 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
   filter(!is.na(weight)) %>%
@@ -520,12 +524,14 @@ surveys %>%
 ```
 :::
 
+    #> `summarise()` regrouping output by 'sex' (override with `.groups` argument)
+
 Here, again, the output from these calls doesn't run off the screen
 anymore. If you want to display more data, you can use the `print()`
 function at the end of your chain with the argument `n` specifying the
 number of rows to display:
 
-::: {#cb22 .sourceCode}
+::: {#cb25 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
   filter(!is.na(weight)) %>%
@@ -535,12 +541,14 @@ surveys %>%
 ```
 :::
 
+    #> `summarise()` regrouping output by 'sex' (override with `.groups` argument)
+
 Once the data are grouped, you can also summarize multiple variables at
 the same time (and not necessarily on the same variable). For instance,
 we could add a column indicating the minimum weight for each species for
 each sex:
 
-::: {#cb23 .sourceCode}
+::: {#cb27 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
   filter(!is.na(weight)) %>%
@@ -550,11 +558,13 @@ surveys %>%
 ```
 :::
 
+    #> `summarise()` regrouping output by 'sex' (override with `.groups` argument)
+
 It is sometimes useful to rearrange the result of a query to inspect the
 values. For instance, we can sort on `min_weight` to put the lighter
 species first:
 
-::: {#cb24 .sourceCode}
+::: {#cb29 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
   filter(!is.na(weight)) %>%
@@ -565,10 +575,12 @@ surveys %>%
 ```
 :::
 
+    #> `summarise()` regrouping output by 'sex' (override with `.groups` argument)
+
 To sort in descending order, we need to add the `desc()` function. If we
 want to sort the results by decreasing order of mean weight:
 
-::: {#cb25 .sourceCode}
+::: {#cb31 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
   filter(!is.na(weight)) %>%
@@ -578,6 +590,8 @@ surveys %>%
   arrange(desc(mean_weight))
 ```
 :::
+
+    #> `summarise()` regrouping output by 'sex' (override with `.groups` argument)
 :::
 
 ::: {#counting .section .level4}
@@ -588,7 +602,7 @@ found for each factor or combination of factors. For this task,
 **`dplyr`** provides `count()`. For example, if we wanted to count the
 number of rows of data for each sex, we would do:
 
-::: {#cb26 .sourceCode}
+::: {#cb33 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
     count(sex) 
@@ -600,7 +614,7 @@ grouping by a variable, and summarizing it by counting the number of
 observations in that group. In other words, `surveys %>% count()` is
 equivalent to:
 
-::: {#cb27 .sourceCode}
+::: {#cb34 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
     group_by(sex) %>%
@@ -608,9 +622,11 @@ surveys %>%
 ```
 :::
 
+    #> `summarise()` ungrouping output (override with `.groups` argument)
+
 For convenience, `count()` provides the `sort` argument:
 
-::: {#cb28 .sourceCode}
+::: {#cb36 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
     count(sex, sort = TRUE) 
@@ -622,7 +638,7 @@ rows/observations for *one* factor (i.e., `sex`). If we wanted to count
 *combination of factors*, such as `sex` and `species`, we would specify
 the first and the second factor as the arguments of `count()`:
 
-::: {#cb29 .sourceCode}
+::: {#cb37 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
   count(sex, species) 
@@ -635,7 +651,7 @@ For instance, we might want to arrange the table above in (i) an
 alphabetical order of the levels of the species and (ii) in descending
 order of the count:
 
-::: {#cb30 .sourceCode}
+::: {#cb38 .sourceCode}
 ``` {.sourceCode .r}
 surveys %>%
   count(sex, species) %>%
@@ -645,7 +661,7 @@ surveys %>%
 
 From the table above, we may learn that, for instance, there are 75
 observations of the *albigula* species that are not specified for its
-sex (i.e. `NA`).
+sex (i.e. `NA`).
 
 > ### Challenge {#challenge-2 .challenge}
 >
@@ -655,7 +671,7 @@ sex (i.e. `NA`).
 > ### Answer {#answer-2 .toc-ignore}
 >
 > ::: {style="background: #fff;"}
-> ::: {#cb31 .sourceCode}
+> ::: {#cb39 .sourceCode}
 > ``` {.sourceCode .r}
 > surveys %>%
 >     count(plot_type) 
@@ -672,7 +688,7 @@ sex (i.e. `NA`).
 > ### Answer {#answer-3 .toc-ignore}
 >
 > ::: {style="background: #fff;"}
-> ::: {#cb32 .sourceCode}
+> ::: {#cb40 .sourceCode}
 > ``` {.sourceCode .r}
 > surveys %>%
 >     filter(!is.na(hindfoot_length)) %>%
@@ -685,6 +701,8 @@ sex (i.e. `NA`).
 >     )
 > ```
 > :::
+>
+>     #> `summarise()` ungrouping output (override with `.groups` argument)
 > :::
 > :::
 >
@@ -695,7 +713,7 @@ sex (i.e. `NA`).
 > ### Answer {#answer-4 .toc-ignore}
 >
 > ::: {style="background: #fff;"}
-> ::: {#cb33 .sourceCode}
+> ::: {#cb42 .sourceCode}
 > ``` {.sourceCode .r}
 > surveys %>%
 >     filter(!is.na(weight)) %>%
@@ -771,13 +789,19 @@ each genus in each plot over the entire survey period. We use
 and variables of interest, and create a new variable for the
 `mean_weight`.
 
-::: {#cb34 .sourceCode}
+::: {#cb43 .sourceCode}
 ``` {.sourceCode .r}
 surveys_gw <- surveys %>%
   filter(!is.na(weight)) %>%
   group_by(plot_id, genus) %>%
   summarize(mean_weight = mean(weight))
+```
+:::
+
+    #> `summarise()` regrouping output by 'plot_id' (override with `.groups` argument)
 
+::: {#cb45 .sourceCode}
+``` {.sourceCode .r}
 str(surveys_gw)
 ```
 :::
@@ -787,7 +811,7 @@ across multiple rows, 196 observations of 3 variables. Using `spread()`
 to key on `genus` with values from `mean_weight` this becomes 24
 observations of 11 variables, one row for each plot.
 
-::: {#cb35 .sourceCode}
+::: {#cb46 .sourceCode}
 ``` {.sourceCode .r}
 surveys_spread <- surveys_gw %>%
   spread(key = genus, value = mean_weight)
@@ -801,7 +825,7 @@ str(surveys_spread)
 We could now plot comparisons between the weight of genera in different
 plots, although we may wish to fill in the missing values first.
 
-::: {#cb36 .sourceCode}
+::: {#cb47 .sourceCode}
 ``` {.sourceCode .r}
 surveys_gw %>%
   spread(genus, mean_weight, fill = 0) %>%
@@ -836,7 +860,7 @@ called `genus` and value called `mean_weight` and use all columns except
 `plot_id` for the key variable. Here we exclude `plot_id` from being
 `gather()`ed.
 
-::: {#cb37 .sourceCode}
+::: {#cb48 .sourceCode}
 ``` {.sourceCode .r}
 surveys_gather <- surveys_spread %>%
   gather(key = "genus", value = "mean_weight", -plot_id)
@@ -857,7 +881,7 @@ and it's easier to specify what to gather than what to leave alone. And
 if the columns are directly adjacent, we don't even need to list them
 all out - just use the `:` operator!
 
-::: {#cb38 .sourceCode}
+::: {#cb49 .sourceCode}
 ``` {.sourceCode .r}
 surveys_spread %>%
   gather(key = "genus", value = "mean_weight", Baiomys:Spermophilus) %>%
@@ -878,13 +902,19 @@ surveys_spread %>%
 > ### Answer {#answer-5 .toc-ignore}
 >
 > ::: {style="background: #fff;"}
-> ::: {#cb39 .sourceCode}
+> ::: {#cb50 .sourceCode}
 > ``` {.sourceCode .r}
 > surveys_spread_genera <- surveys %>%
 >   group_by(plot_id, year) %>%
 >   summarize(n_genera = n_distinct(genus)) %>%
 >   spread(year, n_genera)
+> ```
+> :::
+>
+>     #> `summarise()` regrouping output by 'plot_id' (override with `.groups` argument)
 >
+> ::: {#cb52 .sourceCode}
+> ``` {.sourceCode .r}
 > head(surveys_spread_genera)
 > ```
 > :::
@@ -898,7 +928,7 @@ surveys_spread %>%
 > ### Answer {#answer-6 .toc-ignore}
 >
 > ::: {style="background: #fff;"}
-> ::: {#cb40 .sourceCode}
+> ::: {#cb53 .sourceCode}
 > ``` {.sourceCode .r}
 > surveys_spread_genera %>%
 >   gather("year", "n_genera", -plot_id)
@@ -921,7 +951,7 @@ surveys_spread %>%
 > ### Answer {#answer-7 .toc-ignore}
 >
 > ::: {style="background: #fff;"}
-> ::: {#cb41 .sourceCode}
+> ::: {#cb54 .sourceCode}
 > ``` {.sourceCode .r}
 > surveys_long <- surveys %>%
 >   gather("measurement", "value", hindfoot_length, weight)
@@ -940,7 +970,7 @@ surveys_spread %>%
 > ### Answer {#answer-8 .toc-ignore}
 >
 > ::: {style="background: #fff;"}
-> ::: {#cb42 .sourceCode}
+> ::: {#cb55 .sourceCode}
 > ``` {.sourceCode .r}
 > surveys_long %>%
 >   group_by(year, measurement, plot_type) %>%
@@ -948,6 +978,8 @@ surveys_spread %>%
 >   spread(measurement, mean_value)
 > ```
 > :::
+>
+>     #> `summarise()` regrouping output by 'year', 'measurement' (override with `.groups` argument)
 > :::
 > :::
 :::
@@ -983,7 +1015,7 @@ data.
 Let's start by removing observations of animals for which `weight` and
 `hindfoot_length` are missing, or the `sex` has not been determined:
 
-::: {#cb43 .sourceCode}
+::: {#cb57 .sourceCode}
 ``` {.sourceCode .r}
 surveys_complete <- surveys %>%
   filter(!is.na(weight),           # remove missing weight
@@ -1000,7 +1032,7 @@ how often each species has been observed, and filter out the rare
 species; then, we will extract only the observations for these more
 common species:
 
-::: {#cb44 .sourceCode}
+::: {#cb58 .sourceCode}
 ``` {.sourceCode .r}
 ## Extract the most common species_id
 species_counts <- surveys_complete %>%
@@ -1020,13 +1052,13 @@ To make sure that everyone has the same data set, check that
 Now that our data set is ready, we can save it as a CSV file in our
 `data` folder.
 
-::: {#cb45 .sourceCode}
+::: {#cb59 .sourceCode}
 ``` {.sourceCode .r}
 write_csv(surveys_complete, path = "data/surveys_complete.csv")
 ```
 :::
 
-Page built on: 📆 2020-07-24 ‒ 🕢 19:45:38
+Page built on: 📆 2020-07-24 ‒ 🕢 19:46:31
 :::
 
 ------------------------------------------------------------------------