Skip to content

Commit

Permalink
Updated the vignettes where applicable to use the tester worksheets i…
Browse files Browse the repository at this point in the history
…nstead of the pbc
  • Loading branch information
yulric committed Dec 11, 2024
1 parent 744df02 commit 4877138
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 49 deletions.
60 changes: 30 additions & 30 deletions vignettes/derived_variables.Rmd
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
---
title: "Derived variables"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Derived variables}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
---
title: "Derived variables"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Derived variables}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# Instead of using relative path, use package data
data(pbc, pbc_variable_details, package = "recodeflow")
```

## Introduction
## Introduction

_recodeflow_ supports the use of derived variables. Derived variables can be any custom function as long as the variable can be calculated on a per row basis. Functions requiring operations across rows or on the full data set are not supported.

Expand All @@ -27,29 +25,29 @@ data(pbc, pbc_variable_details, package = "recodeflow")
1. Create and load a customized function.
2. Add the derived variable to the variable_details and variables worksheets.

## Example of a derived function
## Example of a derived function

We'll walk through an example of creating a derived variable with our example data.

Our customized derived function is multiplying the blood concentration of cholesterol (`chol`) with the blood concentration of bilirunbin (`bili`).

### 1. Create and load a customized function for your derived variables.
### 1. Create and load a customized function for your derived variables.

**Create the custom function:** Here is the customized function for our derived variable (`chol`*`bili`):
```{r, warning=FALSE, message=FALSE}
#example_der_fun caluclates chol*bili
#@param chol the row value for chol
#@param bili the row value for bili
#@export
example_der_fun <- function(chol, bili){
# as numeric is used to coerce in case categorical numeric variables are used.
# Warning either chol or bili being NA will result in NA return
example_der <- as.numeric(chol)*as.numeric(bili)
return(example_der)
}
```

```{r, warning=FALSE, message=FALSE}
#example_der_fun caluclates chol*bili
#@param chol the row value for chol
#@param bili the row value for bili
#@export
example_der_fun <- function(chol, bili){
# as numeric is used to coerce in case categorical numeric variables are used.
# Warning either chol or bili being NA will result in NA return
example_der <- as.numeric(chol)*as.numeric(bili)
return(example_der)
}
```

**Note:** You **must** use roxygen2 documentation for custom functions otherwise the function cannot be attached to a package. See [roxygen2](https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html) on how to format and document your function.

Expand All @@ -61,14 +59,14 @@ data(pbc, pbc_variable_details, package = "recodeflow")
If you don't load the customized function you cannot create the derived variable.


### 2. Add the derived variable to the `variable_details` and `variables` worksheets.
### 2. Add the derived variable to the `variable_details` and `variables` worksheets.

Add the derived variable to the `variables` worksheet. You'll use the same nomenclature as any other variable. See the article [`variables_sheet`](../articles/variables_worksheet.html) for nomenclature rules.

Add the derived variable to the variable_details. See the article [`variable_details`](../articles/variable_details_worksheet.html) for nomenclature rules.


### 3. Recode the derived variable
### 3. Recode the derived variable

Use the function `rec_with_table` to recode your derived function.

Expand All @@ -83,7 +81,9 @@ data(pbc, pbc_variable_details, package = "recodeflow")
derived1 <- rec_with_table(
data = pbc,
variables = c("chol", "bili","example_der"),
variable_details = pbc_variable_details,
variable_details = recodeflow::tester_variable_details,
database_name = 'tester1',
log = TRUE)
print(head(derived1))
```

32 changes: 16 additions & 16 deletions vignettes/how_to_use_recodeflow_with_your_data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -57,111 +57,111 @@ Write `stage` in the column **variable** in the six rows.
```{r, echo=FALSE, warning=FALSE, message=FALSE}
library(knitr)
library(kableExtra)
kable(recodeflow::pbc_variable_details[, 'variable'], col.names = c('variable'))
kable(recodeflow::tester_variable_details[, 'variable'], col.names = c('variable'))
```

2. **typeEnd:** indicates the type of variable (continous or categorical)for the recoded (final) variable. `stage`, which captures the stage of the disease, is a categorical variable in the original dataset and will remain a categorical variable after recoding.

Write 'cat' in the six rows.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

3. **typeStart:** indicates the type of variable (continous or categorical) for the original variable. `stage` is a categorical variable in the original dataset.

Write 'cat' in the six rows.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

4. **databaseStart:** indicates the name of the database(s) from which the original variable(s) is(are) obtained.

Write the dataset names, separated by a comma, in the six rows

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

5. **variableStart:** indicates the original variable name(s) in the database(s). In our example, both datasets have the start variable `stage`. Therefore we can indicate a single variable name here. If the variable names were different, we would need to indicate 'dataset_name::variable_name' separated by commas for each of the datasets.

Write the variable name in squared brackets once per row, for all six rows.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

6. **variableStartLabel:** indicates the original variable lable

Write "stage" in the 6 rows.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

7. **numValidCat:** indicates the number of valid categories for the final derived variable. In our example, there are four categories for `stage`: 1, 2, 3, and 4. Note that the categories 'not applicable', 'missing', and 'else' are not included in the category count.

Write 4 in each of the six rows.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

8. **recEnd:** indicates the category to which you are recoding each row. For the not applicable rows `NA::a` is written. For the missing and else rows `NA::b` is written. The `haven` package is used for tagging NA in numeric variables.

We are not changing the categories `stage`, therefore, the recEnd values for these rows will be the same as the the original data. For the not applicable rows write `NA::a`. For the missing and else rows write `NA::b`.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

9. **catLabel:** indicates the lable for the recoded categorical level.

Write Stage 1, Stage 2, Stage 3, Stage 4, NA, and missing.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

10. **catLabelLong:** provides a more elaborate lable for the recoded categorical level. If not required, repeat the shorter _catLabel_.

Copy values from **catLabel**

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

11. **units:** indicates the unit of measure for the variable. The histologic stage of disease does not have a units of measurement.

Write "N/A" in all six rows.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

12. **recStart:** indicates the category(ies) from which you are recoding each row. Since we are not combining levels of categories and we are keeping the category levels the same, the recStart column will be identical to recEnd. If multiple categories were being combined into a single category, the original categories would be indicated in square brackets, separated by commas.

Write the category level you are recoding each row too. For the not applicable rows `NA::a` is written. For the missing and else rows `NA::b` is written.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

13. **catStartLabel:** indicates the original variable category label. The `stage` label should be identical to what is shown in the original data documentation. For the missing rows, each missing category is described along with their coded values.

Write Stage 1, Stage 2, Stage 3, Stage 4, NA, and missing.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

14. **notes: ** Capture any important differences in a variable across datasets. For our example, there are no differences across datasets.

Write "This is sample survival pbc data" in all six rows.

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

## `variable_details` for dervived variables
Expand All @@ -177,7 +177,7 @@ The same naming convention applies to derived variables with the exception of tw
A derived variable looks like this in `variable_details.csv`

```{r, echo=FALSE, warning=FALSE}
kable(recodeflow::pbc_variable_details)
kable(recodeflow::tester_variable_details)
```

## How to create the variables worksheet `variables`
Expand Down Expand Up @@ -205,7 +205,7 @@ Once mapped and specified on `variable_details`, the `stage` variable can be spe
```{r, echo=FALSE, warning=FALSE}
library(knitr)
library(kableExtra)
kable(recodeflow::pbc_variables)
kable(recodeflow::tester_variables)
```


4 changes: 2 additions & 2 deletions vignettes/variable_details.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ library(DT)
library(knitr)
library(kableExtra)
variable_details <- recodeflow::pbc_variable_details
variable_details <- recodeflow::tester_variable_details
if(exists("variable_details")) {
datatable(variable_details, options = list(pageLength = 2))
} else {
Expand Down Expand Up @@ -196,7 +196,7 @@ The same naming convention applies to derived variables with the exception of tw
A derived variable looks like this in `variable_details.csv`

```{r, echo=FALSE, warning=FALSE}
kable(variable_details[64,1:14])
kable(variable_details[variable_details$variable == "example_der",])
```

## Tables
Expand Down
2 changes: 1 addition & 1 deletion vignettes/variables_sheet.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ This vignette describes how the `variables` worksheet is organized and how to fi
Read the `variables` worksheet

```{r Read variables.csv, echo=FALSE, message=FALSE, warning=FALSE}
variables <- recodeflow::pbc_variables
variables <- recodeflow::tester_variables
cat("There are", nrow(variables), "variables, grouped in", sum(!duplicated(variables$section)), "sections and", sum(!duplicated(variables$subject)), "subjects that are available for transformation.")
```
Expand Down

0 comments on commit 4877138

Please sign in to comment.