Merge pull request #535 from jhudsl/clif-functions-w24

Clif functions w24
jhudsl · Jan 19, 2024 · d2da3e2 · d2da3e2
2 parents 36fdeda + 5aab9b5
commit d2da3e2
Show file tree

Hide file tree

Showing 3 changed files with 90 additions and 25 deletions.
diff --git a/modules/Functions/Functions.Rmd b/modules/Functions/Functions.Rmd
@@ -13,9 +13,11 @@ library(knitr)
 library(stringr)
 library(tidyr)
 library(emo)
+library(readr)
 opts_chunk$set(comment = "")
 ```
 
+
 ## Writing your own functions
 
 So far we've seen many functions, like `c()`, `class()`, `filter()`, `dim()` ...
@@ -27,11 +29,23 @@ So far we've seen many functions, like `c()`, `class()`, `filter()`, `dim()` ...
 - Avoid running code unintentionally
 - Use names that make sense to you
 
+
 ## Writing your own functions
 
-Here we will write a function that multiplies some number (x) by 2:
+The general syntax for a function is: 
 
-```{r comment=""}
+```
+function_name <- function(arg1, arg2, ...) {
+ <function body>
+}
+```
+
+
+## Writing your own functions
+
+Here we will write a function that multiplies some number `x` by 2:
+
+```{r}
 times_2 <- function(x) x * 2
 ```
 
@@ -41,6 +55,7 @@ When you run the line of code above, you make it ready to use (no output yet!).
 times_2(x = 10)
 ```
 
+
 ## Writing your own functions: `{ }`
 
 Adding the curly brackets - `{}` - allows you to use functions spanning multiple lines:
@@ -59,18 +74,6 @@ is_even(x = times_2(x = 10))
 ```
 
 
-## Writing your own functions
-
-The general syntax for a function is: 
-
-```
-functionName <- function(inputs) {
- <function body>
-return(value)
-}
-```
-
-
 ## Writing your own functions: `return`
 
 If we want something specific for the function's output, we use `return()`:
@@ -89,6 +92,8 @@ times_2_plus_4(x = 10)
  - printed results do not stay around but can show what a function is doing
  - returned results stay around
  - can only return one result but can print many
+ - if `return` not called, last evaluated expression is returned
+ - `return` should be the last step (steps after may be skipped)
 
 ## Adding print
 
@@ -130,6 +135,7 @@ result <- x_and_y_plus_2(x = 10, y = 3)
 result
 ```
 
+
 ## Writing your own functions: defaults
 
 Functions can have "default" arguments. This lets us use the function without using an argument later:
@@ -140,6 +146,7 @@ times_2_plus_y()
 times_2_plus_y(x = 11, y = 4)
 ```
 
+
 ## Writing another simple function
 
 Let's write a function, `sqdif`, that:
@@ -149,6 +156,7 @@ Let's write a function, `sqdif`, that:
 3. squares this difference
 4. then returns the final value 
 
+
 ## Writing another simple function
 
 ```{r comment=""}
@@ -160,6 +168,7 @@ sqdif(10, 5)
 sqdif(11, 4)
 ```
 
+
 ## Writing your own functions: characters
 
 Functions can have any kind of input. Here is a function with characters:
@@ -172,6 +181,7 @@ loud <- function(word) {
 loud(word = "hooray!")
 ```
 
+
 ## Functions for tibbles
 
 We can use `filter(row_number() == n)` to extract a row of a tibble:
@@ -183,12 +193,12 @@ cars <- read_kaggle()
 cars_1_8 <- cars %>% select(1:8)
 ```
 
-
 ```{r}
 get_row(dat = cars, row = 10)
 get_row(dat = iris, row = 4)
 ```
 
+
 ## Functions for tibbles
 
 `select(n)` will choose column `n`:
@@ -203,6 +213,7 @@ get_index <- function(dat, row, col) {
 get_index(dat = cars, row = 10, col = 8)
 ```
 
+
 ## Functions for tibbles
 
 Including default values for arguments:
@@ -217,6 +228,20 @@ get_top <- function(dat, row = 1, col = 1) {
 get_top(dat = cars)
 ```
 
+## Functions for tibbles
+
+Can create function with an argument that allows inputting a column name for `select` or other `dplyr` operation:
+
+```{r}
+clean_dataset <- function(dataset, col_name) {
+  my_data_out <- dataset %>% select({{col_name}}) # Note the curly braces
+  write_csv(my_data_out, "clean_data.csv")
+  return(my_data_out)
+}
+
+clean_dataset(dataset = mtcars, col_name = "cyl")
+```
+
 ## Summary
 
 - Simple functions take the form:
@@ -225,6 +250,7 @@ get_top(dat = cars)
   -`return` will provide a value as output
   - `print` will simply print the value on the screen but not save it
 
+
 ## Lab Part 1
 
 🏠 [Class Website](https://jhudatascience.org/intro_to_r/)  
@@ -244,6 +270,7 @@ These functions take the form:
 sapply(<a vector, list, data frame>, some_function)
 ```
 
+
 ## Using your custom functions: `sapply()`
 
 `r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")`
@@ -256,6 +283,7 @@ sapply(iris, class)
 iris %>% sapply(class)
 ```
 
+
 ## Using your custom functions: `sapply()`
 
 ```{r}
@@ -265,6 +293,7 @@ select(cars, VehYear:VehicleAge) %>%
   head()
 ```
 
+
 ## Using your custom functions "on the fly" to iterate
 
 ```{r comment=""}
@@ -273,22 +302,38 @@ select(cars, VehYear:VehicleAge) %>%
   head()
 ```
 
+
 # across
 
+## Using functions in `mutate()` and `summarize()`
+
+Already know how to use functions to modify columns using `mutate()` or calculate summary statistics using `summarize()`.
+
+```{r}
+mtcars %>%
+  mutate(wt_kg = wt*1000/2.205,
+         power_watts = hp*745.7) %>%
+  summarize(mean_kg = mean(wt_kg),
+            max_watts = max(power_watts))
+```
+
+
 ## Applying functions with `across` from `dplyr`
 
 `across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()` or `mutate()`.
 
 ```
-summarize(across( .cols = <columns>, .fns = function, ... )) 
+summarize(across( .cols = <columns>, .fns = function)) 
 ```
 or
 ```
-mutate(across(.cols = <columns>, .fns = function, ...))
+mutate(across(.cols = <columns>, .fns = function))
 ```
+
 - List columns first : `.cols = `
 - List function next: `.fns = `
-- Then list any arguments for the function (e.g., `na.rm = TRUE`)
+- If there are arguments to a function (e.g., `na.rm = TRUE`), the function may need to be modified to an anonymous function, e.g., `\(x) mean(x, na.rm = TRUE)`
+
 
 ## Applying functions with `across` from `dplyr`
 
@@ -301,6 +346,7 @@ cars_dbl %>%
   summarize(across(.cols = everything(), .fns = mean))
 ```
 
+
 ## Applying functions with `across` from `dplyr`
 
 Can use with other tidyverse functions like `group_by`!
@@ -311,18 +357,18 @@ cars_dbl %>%
   summarize(across(.cols = everything(), .fns = mean))
 ```
 
+
 ## Applying functions with `across` from `dplyr`
 
-Combining with `summarize()`:
+To add arguments to functions, may need to use anonymous function. In this syntax, the shorthand `\(x)` is equivalent to `function(x)`.
 
 ```{r warning=FALSE}
-# Adding arguments to the end!
-#
 cars_dbl %>%
   group_by(Make) %>%
-  summarize(across(.cols = everything(), .fns = mean, na.rm = TRUE))
+  summarize(across(.cols = everything(), .fns = \(x) mean(x, na.rm = TRUE)))
 ```
 
+
 ## Applying functions with `across` from `dplyr`
 
 Using different `tidyselect()` options (e.g., `starts_with()`, `ends_with()`, `contains()`)
@@ -333,6 +379,7 @@ cars_dbl %>%
   summarize(across(.cols = starts_with("Veh"), .fns = mean))
 ```
 
+
 ## Applying functions with `across` from `dplyr`
 
 Combining with `mutate()`: rounding to the nearest power of 10 (with negative digits value)
@@ -347,7 +394,6 @@ cars_dbl %>%
 ```
 
 
-
 ## Applying functions with `across` from `dplyr` {.smaller}
 
 Combining with `mutate()` - the `replace_na` function
@@ -369,8 +415,11 @@ mort %>%
   ))
 ```
 
+
 ## Use custom functions within `mutate` and `across`
 
+If your function needs to span more than one line, better to define it first before using inside `mutate()` and `across()`.
+
 ```{r}
 times1000 <- function(x) x * 1000
 
@@ -394,13 +443,15 @@ airquality %>%
 
 Similar to across, `purrr` is a package that allows you to apply a function to multiple columns in a data frame or multiple data objects in a list.
 
+
 ## map_df
 
 ```{r}
 library(purrr)
 airquality %>% map_df(replace_na, replace = 0)
 ```
 
+
 # Multiple Data Frames
 
 ## Multiple data frames {.smaller}
@@ -430,6 +481,7 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
 - `purrr` is a package that you can use to do more iterative work easily
 - Can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously
 
+
 ## Lab Part 2
 
 🏠 [Class Website](https://jhudatascience.org/intro_to_r/)  

diff --git a/modules/Functions/lab/Functions_Lab.Rmd b/modules/Functions/lab/Functions_Lab.Rmd
@@ -39,7 +39,7 @@ return(result)
 
 ```
 
-2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
+2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
 
 ```{r}
 
@@ -51,6 +51,12 @@ return(result)
 
 ```
 
+4. Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?
+
+```{r}
+
+```
+
 # Part 2
 
 4. Read in the SARS-CoV-2 Vaccination data from http://jhudatascience.org/intro_to_r/data/USA_covid19_vaccinations.csv. Assign the data the name "vacc".

diff --git a/modules/Functions/lab/Functions_Lab_Key.Rmd b/modules/Functions/lab/Functions_Lab_Key.Rmd
@@ -48,7 +48,7 @@ sum_squared <- function(x) {
 sum_squared(x = nums)
 ```
 
-2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
+2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
 
 ```{r}
 nums <- c(2, 7, 21, 30, 90)
@@ -68,6 +68,13 @@ has_n <- function(x, n = 21) n %in% x
 has_n(x = nums)
 ```
 
+4. Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?
+
+```{r}
+b_num <- 11
+has_n(x = nums, n = b_num)
+```
+
 # Part 2
 
 4. Read in the SARS-CoV-2 Vaccination data from http://jhudatascience.org/intro_to_r/data/USA_covid19_vaccinations.csv. Assign the data the name "vacc".