Updated the vignettes where applicable to use the tester worksheets i…

…nstead of the pbc
Big-Life-Lab · Dec 11, 2024 · 4877138 · 4877138
1 parent 744df02
commit 4877138
Show file tree

Hide file tree

Showing 4 changed files with 49 additions and 49 deletions.
diff --git a/vignettes/derived_variables.Rmd b/vignettes/derived_variables.Rmd
@@ -1,19 +1,17 @@
-  ---
-  title: "Derived variables"
-  output: rmarkdown::html_vignette
-  vignette: >
-    %\VignetteIndexEntry{Derived variables}     
-    %\VignetteEngine{knitr::rmarkdown} 
-    %\VignetteEncoding{UTF-8}
-  ---
+---
+title: "Derived variables"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Derived variables}     
+  %\VignetteEngine{knitr::rmarkdown} 
+  %\VignetteEncoding{UTF-8}
+---
 
 ```{r setup, include=FALSE}
 knitr::opts_chunk$set(echo = TRUE)
-# Instead of using relative path, use package data
-data(pbc, pbc_variable_details, package = "recodeflow")
 ```
 
-  ## Introduction
+## Introduction
 
   _recodeflow_ supports the use of derived variables. Derived variables can be any custom function as long as the variable can be calculated on a per row basis. Functions requiring operations across rows or on the full data set are not supported. 
 
@@ -27,29 +25,29 @@ data(pbc, pbc_variable_details, package = "recodeflow")
   1. Create and load a customized function.
   2. Add the derived variable to the variable_details and variables worksheets.
 
-  ## Example of a derived function
+## Example of a derived function
 
   We'll walk through an example of creating a derived variable with our example data.
 
   Our customized derived function is multiplying the blood concentration of cholesterol (`chol`) with the blood concentration of bilirunbin (`bili`).
 
-  ### 1. Create and load a customized function for your derived variables.
+### 1. Create and load a customized function for your derived variables.
 
   **Create the custom function:** Here is the customized function for our derived variable (`chol`*`bili`):
-  ```{r, warning=FALSE, message=FALSE}
-
-  #example_der_fun caluclates chol*bili
-  #@param chol the row value for chol
-  #@param bili the row value for bili
-  #@export 
-  example_der_fun <- function(chol, bili){
-    # as numeric is used to coerce in case categorical numeric variables are used.
-    # Warning either chol or bili being NA will result in NA return
-    example_der <- as.numeric(chol)*as.numeric(bili)
-    
-    return(example_der)
-  }
-  ```
+
+```{r, warning=FALSE, message=FALSE}
+#example_der_fun caluclates chol*bili
+#@param chol the row value for chol
+#@param bili the row value for bili
+#@export 
+example_der_fun <- function(chol, bili){
+  # as numeric is used to coerce in case categorical numeric variables are used.
+  # Warning either chol or bili being NA will result in NA return
+  example_der <- as.numeric(chol)*as.numeric(bili)
+  
+  return(example_der)
+}
+```
 
   **Note:** You **must** use roxygen2 documentation for custom functions otherwise the function cannot be attached to a package. See [roxygen2](https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html) on how to format and document your function.
 
@@ -61,14 +59,14 @@ data(pbc, pbc_variable_details, package = "recodeflow")
   If you don't load the customized function you cannot create the derived variable. 
 
 
-  ### 2. Add the derived variable to the `variable_details` and `variables` worksheets.
+### 2. Add the derived variable to the `variable_details` and `variables` worksheets.
 
   Add the derived variable to the `variables` worksheet. You'll use the same nomenclature as any other variable. See the article [`variables_sheet`](../articles/variables_worksheet.html) for nomenclature rules.
 
   Add the derived variable to the variable_details. See the article [`variable_details`](../articles/variable_details_worksheet.html) for nomenclature rules. 
 
 
-  ### 3. Recode the derived variable
+### 3. Recode the derived variable
 
   Use the function `rec_with_table` to recode your derived function. 
 
@@ -83,7 +81,9 @@ data(pbc, pbc_variable_details, package = "recodeflow")
   derived1 <- rec_with_table(
       data = pbc,
       variables = c("chol", "bili","example_der"),
-      variable_details = pbc_variable_details,
+      variable_details = recodeflow::tester_variable_details,
+      database_name = 'tester1',
       log = TRUE)
+  print(head(derived1))
   ```
 
diff --git a/vignettes/how_to_use_recodeflow_with_your_data.Rmd b/vignettes/how_to_use_recodeflow_with_your_data.Rmd
@@ -57,111 +57,111 @@ Write `stage` in the column **variable** in the six rows.
 ```{r, echo=FALSE, warning=FALSE, message=FALSE}
 library(knitr)
 library(kableExtra)
-kable(recodeflow::pbc_variable_details[, 'variable'], col.names = c('variable'))
+kable(recodeflow::tester_variable_details[, 'variable'], col.names = c('variable'))
 ```
 
 2. **typeEnd:** indicates the type of variable (continous or categorical)for the recoded (final) variable. `stage`, which captures the stage of the disease, is a categorical variable in the original dataset and will remain a categorical variable after recoding. 
 
 Write 'cat' in the six rows.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 3. **typeStart:** indicates the type of variable (continous or categorical) for the original variable. `stage` is a categorical variable in the original dataset. 
 
 Write 'cat' in the six rows.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 4. **databaseStart:** indicates the name of the database(s) from which the original variable(s) is(are) obtained. 
 
 Write the dataset names, separated by a comma, in the six rows
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 5. **variableStart:** indicates the original variable name(s) in the database(s). In our example, both datasets have the start variable `stage`. Therefore we can indicate a single variable name here. If the variable names were different, we would need to indicate 'dataset_name::variable_name' separated by commas for each of the datasets.
 
 Write the variable name in squared brackets once per row, for all six rows.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 6. **variableStartLabel:** indicates the original variable lable 
 
 Write "stage" in the 6 rows.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 7. **numValidCat:** indicates the number of valid categories for the final derived variable. In our example, there are four categories for `stage`: 1, 2, 3, and 4. Note that the categories 'not applicable', 'missing', and 'else' are not included in the category count.
 
 Write 4 in each of the six rows.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 8. **recEnd:** indicates the category to which you are recoding each row. For the not applicable rows `NA::a` is written. For the missing and else rows `NA::b` is written. The `haven` package is used for tagging NA in numeric variables.
 
 We are not changing the categories `stage`, therefore, the recEnd values for these rows will be the same as the the original data. For the not applicable rows write `NA::a`. For the missing and else rows write `NA::b`.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 9. **catLabel:** indicates the lable for the recoded categorical level.
 
 Write Stage 1, Stage 2, Stage 3, Stage 4, NA, and missing.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 10. **catLabelLong:** provides a more elaborate lable for the recoded categorical level. If not required, repeat the shorter _catLabel_.
 
 Copy values from **catLabel**
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 11. **units:** indicates the unit of measure for the variable. The histologic stage of disease does not have a units of measurement. 
 
 Write "N/A" in all six rows.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 12. **recStart:** indicates the category(ies) from which you are recoding each row. Since we are not combining levels of categories and we are keeping the category levels the same, the recStart column will be identical to recEnd. If multiple categories were being combined into a single category, the original categories would be indicated in square brackets, separated by commas.
 
 Write the category level you are recoding each row too. For the not applicable rows `NA::a` is written. For the missing and else rows `NA::b` is written.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 13. **catStartLabel:** indicates the original variable category label. The `stage` label should be identical to what is shown in the original data documentation. For the missing rows, each missing category is described along with their coded values.
 
 Write Stage 1, Stage 2, Stage 3, Stage 4, NA, and missing.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 14. **notes: ** Capture any important differences in a variable across datasets. For our example, there are no differences across datasets.
 
 Write "This is sample survival pbc data" in all six rows.
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 ## `variable_details` for dervived variables
@@ -177,7 +177,7 @@ The same naming convention applies to derived variables with the exception of tw
 A derived variable looks like this in `variable_details.csv`
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(recodeflow::pbc_variable_details)
+kable(recodeflow::tester_variable_details)
 ```
 
 ## How to create the variables worksheet `variables`
@@ -205,7 +205,7 @@ Once mapped and specified on `variable_details`, the `stage` variable can be spe
 ```{r, echo=FALSE, warning=FALSE}
 library(knitr)
 library(kableExtra)
-kable(recodeflow::pbc_variables)
+kable(recodeflow::tester_variables)
 ```
 
 
diff --git a/vignettes/variable_details.Rmd b/vignettes/variable_details.Rmd
@@ -23,7 +23,7 @@ library(DT)
 library(knitr)
 library(kableExtra)
 
-variable_details <- recodeflow::pbc_variable_details
+variable_details <- recodeflow::tester_variable_details
 if(exists("variable_details")) {
   datatable(variable_details, options = list(pageLength = 2))
 } else {
@@ -196,7 +196,7 @@ The same naming convention applies to derived variables with the exception of tw
 A derived variable looks like this in `variable_details.csv`
 
 ```{r, echo=FALSE, warning=FALSE}
-kable(variable_details[64,1:14])
+kable(variable_details[variable_details$variable == "example_der",])
 ```
 
 ## Tables

diff --git a/vignettes/variables_sheet.Rmd b/vignettes/variables_sheet.Rmd
@@ -23,7 +23,7 @@ This vignette describes how the `variables` worksheet is organized and how to fi
 Read the `variables` worksheet
 
 ```{r Read variables.csv, echo=FALSE, message=FALSE, warning=FALSE}
-variables <- recodeflow::pbc_variables
+variables <- recodeflow::tester_variables
 cat("There are", nrow(variables), "variables, grouped in", sum(!duplicated(variables$section)), "sections and", sum(!duplicated(variables$subject)), "subjects that are available for transformation.")
 
 ```