Skip to content

Commit

Permalink
Merge pull request #535 from jhudsl/clif-functions-w24
Browse files Browse the repository at this point in the history
Clif functions w24
  • Loading branch information
clifmckee authored Jan 19, 2024
2 parents 36fdeda + 5aab9b5 commit d2da3e2
Show file tree
Hide file tree
Showing 3 changed files with 90 additions and 25 deletions.
98 changes: 75 additions & 23 deletions modules/Functions/Functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ library(knitr)
library(stringr)
library(tidyr)
library(emo)
library(readr)
opts_chunk$set(comment = "")
```


## Writing your own functions

So far we've seen many functions, like `c()`, `class()`, `filter()`, `dim()` ...
Expand All @@ -27,11 +29,23 @@ So far we've seen many functions, like `c()`, `class()`, `filter()`, `dim()` ...
- Avoid running code unintentionally
- Use names that make sense to you


## Writing your own functions

Here we will write a function that multiplies some number (x) by 2:
The general syntax for a function is:

```{r comment=""}
```
function_name <- function(arg1, arg2, ...) {
<function body>
}
```


## Writing your own functions

Here we will write a function that multiplies some number `x` by 2:

```{r}
times_2 <- function(x) x * 2
```

Expand All @@ -41,6 +55,7 @@ When you run the line of code above, you make it ready to use (no output yet!).
times_2(x = 10)
```


## Writing your own functions: `{ }`

Adding the curly brackets - `{}` - allows you to use functions spanning multiple lines:
Expand All @@ -59,18 +74,6 @@ is_even(x = times_2(x = 10))
```


## Writing your own functions

The general syntax for a function is:

```
functionName <- function(inputs) {
<function body>
return(value)
}
```


## Writing your own functions: `return`

If we want something specific for the function's output, we use `return()`:
Expand All @@ -89,6 +92,8 @@ times_2_plus_4(x = 10)
- printed results do not stay around but can show what a function is doing
- returned results stay around
- can only return one result but can print many
- if `return` not called, last evaluated expression is returned
- `return` should be the last step (steps after may be skipped)

## Adding print

Expand Down Expand Up @@ -130,6 +135,7 @@ result <- x_and_y_plus_2(x = 10, y = 3)
result
```


## Writing your own functions: defaults

Functions can have "default" arguments. This lets us use the function without using an argument later:
Expand All @@ -140,6 +146,7 @@ times_2_plus_y()
times_2_plus_y(x = 11, y = 4)
```


## Writing another simple function

Let's write a function, `sqdif`, that:
Expand All @@ -149,6 +156,7 @@ Let's write a function, `sqdif`, that:
3. squares this difference
4. then returns the final value


## Writing another simple function

```{r comment=""}
Expand All @@ -160,6 +168,7 @@ sqdif(10, 5)
sqdif(11, 4)
```


## Writing your own functions: characters

Functions can have any kind of input. Here is a function with characters:
Expand All @@ -172,6 +181,7 @@ loud <- function(word) {
loud(word = "hooray!")
```


## Functions for tibbles

We can use `filter(row_number() == n)` to extract a row of a tibble:
Expand All @@ -183,12 +193,12 @@ cars <- read_kaggle()
cars_1_8 <- cars %>% select(1:8)
```


```{r}
get_row(dat = cars, row = 10)
get_row(dat = iris, row = 4)
```


## Functions for tibbles

`select(n)` will choose column `n`:
Expand All @@ -203,6 +213,7 @@ get_index <- function(dat, row, col) {
get_index(dat = cars, row = 10, col = 8)
```


## Functions for tibbles

Including default values for arguments:
Expand All @@ -217,6 +228,20 @@ get_top <- function(dat, row = 1, col = 1) {
get_top(dat = cars)
```

## Functions for tibbles

Can create function with an argument that allows inputting a column name for `select` or other `dplyr` operation:

```{r}
clean_dataset <- function(dataset, col_name) {
my_data_out <- dataset %>% select({{col_name}}) # Note the curly braces
write_csv(my_data_out, "clean_data.csv")
return(my_data_out)
}
clean_dataset(dataset = mtcars, col_name = "cyl")
```

## Summary

- Simple functions take the form:
Expand All @@ -225,6 +250,7 @@ get_top(dat = cars)
-`return` will provide a value as output
- `print` will simply print the value on the screen but not save it


## Lab Part 1

🏠 [Class Website](https://jhudatascience.org/intro_to_r/)
Expand All @@ -244,6 +270,7 @@ These functions take the form:
sapply(<a vector, list, data frame>, some_function)
```


## Using your custom functions: `sapply()`

`r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")`
Expand All @@ -256,6 +283,7 @@ sapply(iris, class)
iris %>% sapply(class)
```


## Using your custom functions: `sapply()`

```{r}
Expand All @@ -265,6 +293,7 @@ select(cars, VehYear:VehicleAge) %>%
head()
```


## Using your custom functions "on the fly" to iterate

```{r comment=""}
Expand All @@ -273,22 +302,38 @@ select(cars, VehYear:VehicleAge) %>%
head()
```


# across

## Using functions in `mutate()` and `summarize()`

Already know how to use functions to modify columns using `mutate()` or calculate summary statistics using `summarize()`.

```{r}
mtcars %>%
mutate(wt_kg = wt*1000/2.205,
power_watts = hp*745.7) %>%
summarize(mean_kg = mean(wt_kg),
max_watts = max(power_watts))
```


## Applying functions with `across` from `dplyr`

`across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()` or `mutate()`.

```
summarize(across( .cols = <columns>, .fns = function, ... ))
summarize(across( .cols = <columns>, .fns = function))
```
or
```
mutate(across(.cols = <columns>, .fns = function, ...))
mutate(across(.cols = <columns>, .fns = function))
```

- List columns first : `.cols = `
- List function next: `.fns = `
- Then list any arguments for the function (e.g., `na.rm = TRUE`)
- If there are arguments to a function (e.g., `na.rm = TRUE`), the function may need to be modified to an anonymous function, e.g., `\(x) mean(x, na.rm = TRUE)`


## Applying functions with `across` from `dplyr`

Expand All @@ -301,6 +346,7 @@ cars_dbl %>%
summarize(across(.cols = everything(), .fns = mean))
```


## Applying functions with `across` from `dplyr`

Can use with other tidyverse functions like `group_by`!
Expand All @@ -311,18 +357,18 @@ cars_dbl %>%
summarize(across(.cols = everything(), .fns = mean))
```


## Applying functions with `across` from `dplyr`

Combining with `summarize()`:
To add arguments to functions, may need to use anonymous function. In this syntax, the shorthand `\(x)` is equivalent to `function(x)`.

```{r warning=FALSE}
# Adding arguments to the end!
#
cars_dbl %>%
group_by(Make) %>%
summarize(across(.cols = everything(), .fns = mean, na.rm = TRUE))
summarize(across(.cols = everything(), .fns = \(x) mean(x, na.rm = TRUE)))
```


## Applying functions with `across` from `dplyr`

Using different `tidyselect()` options (e.g., `starts_with()`, `ends_with()`, `contains()`)
Expand All @@ -333,6 +379,7 @@ cars_dbl %>%
summarize(across(.cols = starts_with("Veh"), .fns = mean))
```


## Applying functions with `across` from `dplyr`

Combining with `mutate()`: rounding to the nearest power of 10 (with negative digits value)
Expand All @@ -347,7 +394,6 @@ cars_dbl %>%
```



## Applying functions with `across` from `dplyr` {.smaller}

Combining with `mutate()` - the `replace_na` function
Expand All @@ -369,8 +415,11 @@ mort %>%
))
```


## Use custom functions within `mutate` and `across`

If your function needs to span more than one line, better to define it first before using inside `mutate()` and `across()`.

```{r}
times1000 <- function(x) x * 1000
Expand All @@ -394,13 +443,15 @@ airquality %>%

Similar to across, `purrr` is a package that allows you to apply a function to multiple columns in a data frame or multiple data objects in a list.


## map_df

```{r}
library(purrr)
airquality %>% map_df(replace_na, replace = 0)
```


# Multiple Data Frames

## Multiple data frames {.smaller}
Expand Down Expand Up @@ -430,6 +481,7 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
- `purrr` is a package that you can use to do more iterative work easily
- Can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously


## Lab Part 2

🏠 [Class Website](https://jhudatascience.org/intro_to_r/)
Expand Down
8 changes: 7 additions & 1 deletion modules/Functions/lab/Functions_Lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ return(result)
```

2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.

```{r}
Expand All @@ -51,6 +51,12 @@ return(result)
```

4. Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?

```{r}
```

# Part 2

4. Read in the SARS-CoV-2 Vaccination data from http://jhudatascience.org/intro_to_r/data/USA_covid19_vaccinations.csv. Assign the data the name "vacc".
Expand Down
9 changes: 8 additions & 1 deletion modules/Functions/lab/Functions_Lab_Key.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ sum_squared <- function(x) {
sum_squared(x = nums)
```

2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.

```{r}
nums <- c(2, 7, 21, 30, 90)
Expand All @@ -68,6 +68,13 @@ has_n <- function(x, n = 21) n %in% x
has_n(x = nums)
```

4. Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?

```{r}
b_num <- 11
has_n(x = nums, n = b_num)
```

# Part 2

4. Read in the SARS-CoV-2 Vaccination data from http://jhudatascience.org/intro_to_r/data/USA_covid19_vaccinations.csv. Assign the data the name "vacc".
Expand Down

0 comments on commit d2da3e2

Please sign in to comment.