Skip to content

Commit

Permalink
Further updates to data name and and use of PBC data.
Browse files Browse the repository at this point in the history
.Rd files committed as a reference point.
  • Loading branch information
DougManuel committed Nov 26, 2024
1 parent 8b18d25 commit dcf600e
Show file tree
Hide file tree
Showing 23 changed files with 399 additions and 152 deletions.
146 changes: 97 additions & 49 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -1,61 +1,109 @@
#' Worksheet for pbc dataset
#' PBC Dataset Collection
#' @docType data
#' @name pbc_datasets
#' @description
#' The variable and variable_details worksheets that holds metadata for the pbc dataset.
#' A collection of related datasets containing PBC data and metadata.
#'
#' @format A list containing DCMI metadata:
#' @details
#' The collection contains four datasets:
#' \describe{
#' \item{title}{title}
#' \item{creator}{creator}
#' \item{subject}{subject}
#' \item{description}{description}
#' \item{publisher}{publisher}
#' \item{date}{date}
#' \item{type}{type}
#' \item{format}{format}
#' \item{identifier}{identifier}
#' \item{source}{source}
#' \item{language}{language}
#' \item{rights}{rights}
#' \item{references}{references}
#' \item{pbc}{Primary Biliary Cirrhosis (PBC) data collected at Mayo Clinic between 1974 and 1984. A data frame with 418 observations and 20 variables including patient characteristics, biochemical measurements, and clinical outcomes.}
#' \item{pbc_metadata}{DCMI metadata for the PBC database. A list containing the database name and other identification information.}
#' \item{pbc_variables}{'variables' metadata for the PBC dataset. Metadata for each variable in the PBC data. A data frame with 24 rows and 11 columns.}
#' \item{pbc_variable_details}{`variable_details` metadata for the PBC dataset. Metadata for variable category or value label in the PBC data. A data frame with 69 rows and 16 columns.}
#' }
#' @source \url{https://cran.r-project.org/web/packages/survival/survival.pdf}
"pbc_data"
#'
#' @source The PBC dataset is from the survival package:
#' \url{https://cran.r-project.org/web/packages/survival/survival.pdf}
#'
#' The metadata (pbc_metadata, pbc_variables, and pbc_variable_details) were created
#' specifically for the recodeflow package to demonstrate and test the package functionality.
NULL

#' @rdname pbc_datasets
#' @format ## pbc
#' A data frame with 418 observations and 20 variables:
#' \describe{
#' \item{id}{case number}
#' \item{time}{number of days between registration and the earlier of death, transplantation, or study analysis time}
#' \item{status}{status at endpoint, 0/1/2 for censored, transplant, dead}
#' \item{trt}{1/2/NA for D-penicillamine, placebo, or not randomized}
#' \item{age}{age in years}
#' \item{sex}{m/f}
#' \item{ascites}{presence of ascites}
#' \item{hepato}{presence of hepatomegaly or enlarged liver}
#' \item{spiders}{blood vessel malformations in the skin}
#' \item{edema}{0 no edema, 0.5 untreated or successfully treated, 1 edema despite diuretic therapy}
#' \item{bili}{serum bilirubin (mg/dl)}
#' \item{chol}{serum cholesterol (mg/dl)}
#' \item{albumin}{serum albumin (g/dl)}
#' \item{copper}{urine copper (ug/day)}
#' \item{alk.phos}{alkaline phosphotase (U/liter)}
#' \item{ast}{aspartate aminotransferase (U/ml)}
#' \item{trig}{triglycerides (mg/dl)}
#' \item{platelet}{platelet count}
#' \item{protime}{standardised blood clotting time}
#' \item{stage}{histologic stage of disease (1, 2, 3, or 4)}
#' }
"pbc"

#' @rdname pbc_datasets
#' @format ## pbc_metadata
#' A list containing DCMI metadata:
#' \describe{
#' \item{title}{title}
#' \item{creator}{creator}
#' \item{subject}{subject}
#' \item{description}{description}
#' \item{publisher}{publisher}
#' \item{date}{date}
#' \item{type}{type}
#' \item{format}{format}
#' \item{identifier}{identifier}
#' \item{source}{source}
#' \item{language}{language}
#' \item{rights}{rights}
#' \item{references}{references}
#' }
"pbc_metadata"

#' @rdname pbc_data
#' @format A data frame with 24 rows and 11 columns:
#' @rdname pbc_datasets
#' @format ## pbc_variables
#' A data frame with 24 rows and 11 columns:
#' \describe{
#' \item{variable}{variable name}
#' \item{label}{variable label}
#' \item{labelLong}{variable label long}
#' \item{subject}{subject}
#' \item{section}{section}
#' \item{variableType}{variable type}
#' \item{databaseStart}{database start}
#' \item{units}{units}
#' \item{variableStart}{variable start}
#' \item{notes}{logical indicating presence of notes}
#' \item{description}{logical indicating presence of description}
#' \item{variable}{variable name}
#' \item{label}{variable label}
#' \item{labelLong}{variable label long}
#' \item{subject}{subject}
#' \item{section}{section}
#' \item{variableType}{variable type}
#' \item{databaseStart}{database start}
#' \item{units}{units}
#' \item{variableStart}{variable start}
#' \item{notes}{logical indicating presence of notes}
#' \item{description}{logical indicating presence of description}
#' }
"pbc_variables"

#' @rdname pbc_data
#' @format A data frame with 69 rows and 16 columns:
#' @rdname pbc_datasets
#' @format ## pbc_variable_details
#' A data frame with 69 rows and 16 columns:
#' \describe{
#' \item{variable}{variable name}
#' \item{dummyVariable}{dummy variable name}
#' \item{typeEnd}{end type}
#' \item{databaseStart}{database start}
#' \item{variableStart}{variable start}
#' \item{typeStart}{start type}
#' \item{recEnd}{record end}
#' \item{recStart}{record start}
#' \item{catLabel}{category label}
#' \item{catLabelLong}{category long label}
#' \item{nubValidCat}{number of valid categories (numeric)}
#' \item{units}{logical indicating presence of units}
#' \item{notes}{logical indicating presence of notes}
#' \item{catStartLabel}{category start label}
#' \item{variableStartShortLabel}{variable start short label}
#' \item{variableStartLabel}{variable start label}
#' \item{variable}{variable name}
#' \item{dummyVariable}{dummy variable name}
#' \item{typeEnd}{end type}
#' \item{databaseStart}{database start}
#' \item{variableStart}{variable start}
#' \item{typeStart}{start type}
#' \item{recEnd}{record end}
#' \item{recStart}{record start}
#' \item{catLabel}{category label}
#' \item{catLabelLong}{category long label}
#' \item{nubValidCat}{number of valid categories (numeric)}
#' \item{units}{logical indicating presence of units}
#' \item{notes}{logical indicating presence of notes}
#' \item{catStartLabel}{category start label}
#' \item{variableStartShortLabel}{variable start short label}
#' \item{variableStartLabel}{variable start label}
#' }
"pbc_variable_details"
6 changes: 3 additions & 3 deletions R/pbc_worksheets.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ library(yaml)
library(readr)
library(usethis)

# import the pbc_data.yaml file.
pbc_data <- read_yaml("inst/extdata/pbc_data.yaml")
usethis::use_data(pbc_data, overwrite = TRUE)
# import the pbc_metadata.yaml file.
pbc_metadata <- read_yaml("inst/extdata/pbc_metadata.yaml")
usethis::use_data(pbc_metadata, overwrite = TRUE)

# import the pbc_variables.csv file.
pbc_variables <- read_csv("inst/extdata/pbc_variables.csv")
Expand Down
6 changes: 3 additions & 3 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ reference:
contents:
- pbc_variables
- pbc_variable_details
- title: "Metadata"
desc: Metadata files
- title: "Example data"
desc: Data for examples
contents:
- pbc_database
- pbc_data
24 changes: 24 additions & 0 deletions inst/extdata/variable.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
"variable","label","labelLong","subject","section","variableType","databaseStart","units","variableStart"
"time","time","number of days between registration and the earlier of death, treatment or end of study","study","time","cont","tester1, tester2","days","[time]"
"status","status","status at end of study","study","status","cat","tester1, tester2","N/A","[status]"
"trt","treatment","treatment","study","trt","cat","tester1, tester2","N/A","[trt]"
"age","age","age","demographic","age","cont","tester1, tester2","years","[age]"
"sex","sex","sex","demographic","sex","cat","tester1, tester2","N/A","[sex]"
"ascites","ascites","prescence of ascites","physical symptom","ascites","cat","tester1, tester2","N/A","[ascites]"
"hepato","hepato","prescence of hepatomegaly or enlarged liver","physical symptom","hepato","cat","tester1, tester2","N/A","[hepato]"
"spiders","spiders","prescence of spiders","physical symptom","spiders","cat","tester1, tester2","N/A","[spiders]"
"edema","edema","edema","physical symptom","edema","cat","tester1, tester2","N/A","[edema]"
"bili","bili","bilirunbin concentration (blood)","lab test","bili","cont","tester1, tester2","mg/dl","[bili]"
"chol","chol","cholestral concentration (blood)","lab test","chol","cont","tester1, tester2","mg/dl","[chol]"
"albumin","albumin","albumin concentration (blood)","lab test","albumin","cont","tester1, tester2","g/dl","[albumin]"
"copper","copper","copper concentration (urine)","lab test","copper","cont","tester1, tester2","ug/dl","[copper]"
"alk.phos","alk.phos","alkaline phosphotase concentration (blood)","lab test","alk.phos","cont","tester1, tester2","U/L","[alk.phos]"
"ast","ast"," aspartate aminotransferase concentration (blood)","lab test","ast","cont","tester1, tester2","U/L","[ast]"
"trig","trig","triglycerides concentration (blood)","lab test","trig","cont","tester1, tester2","mg/dl","[trig]"
"platelet","platelet","platelet count","lab test","platelet","cont","tester1, tester2","N/A","[platelet]"
"protime","protime","standarized blood clotting time","lab test","protime","cont","tester1, tester2","N/A","[protime]"
"stage","stage","histologic stage of disease","lab test","stage","cat","tester1, tester2","N/A","[stage]"
"example_der","example_der","example of dervived function: concentration of cholestral * concentration of bilirunbin","derived","example","cont","tester1, tester2","mg/dl","[example_der]"
"agegrp5","agegrp5","five year age groups","demographics","age","cat","tester1","years","tester1::agegrp"
"agegrp10","agegrp10","ten year age groups","demographics","age","cat","tester1, tester2","years","[agegrp]"
"age_cont","age_cont","continous age created from age groups","demographics","age","cont","tester1, tester2","years","[agegrp]"
Loading

0 comments on commit dcf600e

Please sign in to comment.