Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clif functions w24 #535

Merged
merged 2 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 75 additions & 23 deletions modules/Functions/Functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ library(knitr)
library(stringr)
library(tidyr)
library(emo)
library(readr)
opts_chunk$set(comment = "")
```


## Writing your own functions

So far we've seen many functions, like `c()`, `class()`, `filter()`, `dim()` ...
Expand All @@ -27,11 +29,23 @@ So far we've seen many functions, like `c()`, `class()`, `filter()`, `dim()` ...
- Avoid running code unintentionally
- Use names that make sense to you


## Writing your own functions

Here we will write a function that multiplies some number (x) by 2:
The general syntax for a function is:

```{r comment=""}
```
function_name <- function(arg1, arg2, ...) {
<function body>
}
```


## Writing your own functions

Here we will write a function that multiplies some number `x` by 2:

```{r}
times_2 <- function(x) x * 2
```

Expand All @@ -41,6 +55,7 @@ When you run the line of code above, you make it ready to use (no output yet!).
times_2(x = 10)
```


## Writing your own functions: `{ }`

Adding the curly brackets - `{}` - allows you to use functions spanning multiple lines:
Expand All @@ -59,18 +74,6 @@ is_even(x = times_2(x = 10))
```


## Writing your own functions

The general syntax for a function is:

```
functionName <- function(inputs) {
<function body>
return(value)
}
```


## Writing your own functions: `return`

If we want something specific for the function's output, we use `return()`:
Expand All @@ -89,6 +92,8 @@ times_2_plus_4(x = 10)
- printed results do not stay around but can show what a function is doing
- returned results stay around
- can only return one result but can print many
- if `return` not called, last evaluated expression is returned
- `return` should be the last step (steps after may be skipped)

## Adding print

Expand Down Expand Up @@ -130,6 +135,7 @@ result <- x_and_y_plus_2(x = 10, y = 3)
result
```


## Writing your own functions: defaults

Functions can have "default" arguments. This lets us use the function without using an argument later:
Expand All @@ -140,6 +146,7 @@ times_2_plus_y()
times_2_plus_y(x = 11, y = 4)
```


## Writing another simple function

Let's write a function, `sqdif`, that:
Expand All @@ -149,6 +156,7 @@ Let's write a function, `sqdif`, that:
3. squares this difference
4. then returns the final value


## Writing another simple function

```{r comment=""}
Expand All @@ -160,6 +168,7 @@ sqdif(10, 5)
sqdif(11, 4)
```


## Writing your own functions: characters

Functions can have any kind of input. Here is a function with characters:
Expand All @@ -172,6 +181,7 @@ loud <- function(word) {
loud(word = "hooray!")
```


## Functions for tibbles

We can use `filter(row_number() == n)` to extract a row of a tibble:
Expand All @@ -183,12 +193,12 @@ cars <- read_kaggle()
cars_1_8 <- cars %>% select(1:8)
```


```{r}
get_row(dat = cars, row = 10)
get_row(dat = iris, row = 4)
```


## Functions for tibbles

`select(n)` will choose column `n`:
Expand All @@ -203,6 +213,7 @@ get_index <- function(dat, row, col) {
get_index(dat = cars, row = 10, col = 8)
```


## Functions for tibbles

Including default values for arguments:
Expand All @@ -217,6 +228,20 @@ get_top <- function(dat, row = 1, col = 1) {
get_top(dat = cars)
```

## Functions for tibbles

Can create function with an argument that allows inputting a column name for `select` or other `dplyr` operation:

```{r}
clean_dataset <- function(dataset, col_name) {
my_data_out <- dataset %>% select({{col_name}}) # Note the curly braces
write_csv(my_data_out, "clean_data.csv")
return(my_data_out)
}

clean_dataset(dataset = mtcars, col_name = "cyl")
```

## Summary

- Simple functions take the form:
Expand All @@ -225,6 +250,7 @@ get_top(dat = cars)
-`return` will provide a value as output
- `print` will simply print the value on the screen but not save it


## Lab Part 1

🏠 [Class Website](https://jhudatascience.org/intro_to_r/)
Expand All @@ -244,6 +270,7 @@ These functions take the form:
sapply(<a vector, list, data frame>, some_function)
```


## Using your custom functions: `sapply()`

`r emo::ji("rotating_light")` There are no parentheses on the functions! `r emo::ji("rotating_light")`
Expand All @@ -256,6 +283,7 @@ sapply(iris, class)
iris %>% sapply(class)
```


## Using your custom functions: `sapply()`

```{r}
Expand All @@ -265,6 +293,7 @@ select(cars, VehYear:VehicleAge) %>%
head()
```


## Using your custom functions "on the fly" to iterate

```{r comment=""}
Expand All @@ -273,22 +302,38 @@ select(cars, VehYear:VehicleAge) %>%
head()
```


# across

## Using functions in `mutate()` and `summarize()`

Already know how to use functions to modify columns using `mutate()` or calculate summary statistics using `summarize()`.

```{r}
mtcars %>%
mutate(wt_kg = wt*1000/2.205,
power_watts = hp*745.7) %>%
summarize(mean_kg = mean(wt_kg),
max_watts = max(power_watts))
```


## Applying functions with `across` from `dplyr`

`across()` makes it easy to apply the same transformation to multiple columns. Usually used with `summarize()` or `mutate()`.

```
summarize(across( .cols = <columns>, .fns = function, ... ))
summarize(across( .cols = <columns>, .fns = function))
```
or
```
mutate(across(.cols = <columns>, .fns = function, ...))
mutate(across(.cols = <columns>, .fns = function))
```

- List columns first : `.cols = `
- List function next: `.fns = `
- Then list any arguments for the function (e.g., `na.rm = TRUE`)
- If there are arguments to a function (e.g., `na.rm = TRUE`), the function may need to be modified to an anonymous function, e.g., `\(x) mean(x, na.rm = TRUE)`


## Applying functions with `across` from `dplyr`

Expand All @@ -301,6 +346,7 @@ cars_dbl %>%
summarize(across(.cols = everything(), .fns = mean))
```


## Applying functions with `across` from `dplyr`

Can use with other tidyverse functions like `group_by`!
Expand All @@ -311,18 +357,18 @@ cars_dbl %>%
summarize(across(.cols = everything(), .fns = mean))
```


## Applying functions with `across` from `dplyr`

Combining with `summarize()`:
To add arguments to functions, may need to use anonymous function. In this syntax, the shorthand `\(x)` is equivalent to `function(x)`.

```{r warning=FALSE}
# Adding arguments to the end!
#
cars_dbl %>%
group_by(Make) %>%
summarize(across(.cols = everything(), .fns = mean, na.rm = TRUE))
summarize(across(.cols = everything(), .fns = \(x) mean(x, na.rm = TRUE)))
```


## Applying functions with `across` from `dplyr`

Using different `tidyselect()` options (e.g., `starts_with()`, `ends_with()`, `contains()`)
Expand All @@ -333,6 +379,7 @@ cars_dbl %>%
summarize(across(.cols = starts_with("Veh"), .fns = mean))
```


## Applying functions with `across` from `dplyr`

Combining with `mutate()`: rounding to the nearest power of 10 (with negative digits value)
Expand All @@ -347,7 +394,6 @@ cars_dbl %>%
```



## Applying functions with `across` from `dplyr` {.smaller}

Combining with `mutate()` - the `replace_na` function
Expand All @@ -369,8 +415,11 @@ mort %>%
))
```


## Use custom functions within `mutate` and `across`

If your function needs to span more than one line, better to define it first before using inside `mutate()` and `across()`.

```{r}
times1000 <- function(x) x * 1000

Expand All @@ -394,13 +443,15 @@ airquality %>%

Similar to across, `purrr` is a package that allows you to apply a function to multiple columns in a data frame or multiple data objects in a list.


## map_df

```{r}
library(purrr)
airquality %>% map_df(replace_na, replace = 0)
```


# Multiple Data Frames

## Multiple data frames {.smaller}
Expand Down Expand Up @@ -430,6 +481,7 @@ AQ_list %>% sapply(colMeans, na.rm = TRUE)
- `purrr` is a package that you can use to do more iterative work easily
- Can use `sapply` or `purrr` to work with multiple data frames within lists simultaneously


## Lab Part 2

🏠 [Class Website](https://jhudatascience.org/intro_to_r/)
Expand Down
8 changes: 7 additions & 1 deletion modules/Functions/lab/Functions_Lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ return(result)

```

2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.

```{r}

Expand All @@ -51,6 +51,12 @@ return(result)

```

4. Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?

```{r}

```

# Part 2

4. Read in the SARS-CoV-2 Vaccination data from http://jhudatascience.org/intro_to_r/data/USA_covid19_vaccinations.csv. Assign the data the name "vacc".
Expand Down
9 changes: 8 additions & 1 deletion modules/Functions/lab/Functions_Lab_Key.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ sum_squared <- function(x) {
sum_squared(x = nums)
```

2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.
2. Create a function that takes two arguments, (1) a vector and (2) a numeric value. This function tests whether the number (2) is contained within the vector (1). **Hint**: use `%in%`. Call it `has_n`. Test your function on the vector `c(2,7,21,30,90)` and number `21` - you should get the answer TRUE.

```{r}
nums <- c(2, 7, 21, 30, 90)
Expand All @@ -68,6 +68,13 @@ has_n <- function(x, n = 21) n %in% x
has_n(x = nums)
```

4. Create a new number `b_num` that is not contained with `nums`. Use your updated `has_n` function with the default value and add `b_num` as the `n` argument when calling the function. What is the outcome?

```{r}
b_num <- 11
has_n(x = nums, n = b_num)
```

# Part 2

4. Read in the SARS-CoV-2 Vaccination data from http://jhudatascience.org/intro_to_r/data/USA_covid19_vaccinations.csv. Assign the data the name "vacc".
Expand Down