- Analytics use case(s): Characterization
- Study type: Phenotyping
- Tags: -
- Study lead: -
- Study lead forums tag: -
- Study start date: September 1, 2023
- Study end date: -
- Protocol: -
- Publications: -
- Results explorer: -
Generating cohort diagnostics DARWIN EU studies.
Below are the instructions for installing and then running the package. For your convience, you can also find this code in extras/CodeTorun.R.
There are several ways in which one could install the StudyDiagnostics
package. However, we recommend using the renv
package:
-
See the instructions here for configuring your R environment, including Java and RStudio.
-
In RStudio, create a new project: File -> New Project… -> New Directory -> New Project. If asked if you want to use
renv
with the project, answer ‘no’. -
Execute the following R code:
# Install the latest version of renv:
install.packages("renv")
# Download the lock file:
download.file("https://raw.githubusercontent.com/darwin-eu-dev/StudyDiagnostics/main/renv.lock", "renv.lock")
# Build the local library. This may take a while:
renv::init()
- Edit the script below to ensure that the variables contain the correct values for your environment, then execute:
library(StudyDiagnostics)
# Specify where the temporary files will be created:
options(andromedaTempFolder = "s:/andromedaTemp")
# Maximum number of cores to be used:
maxCores <- parallel::detectCores()
# Details for connecting to the server. See
# http://ohdsi.github.io/DatabaseConnector/reference/createConnectionDetails.html for more details:
connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = "...",
server = "...",
user = "...",
password = "...",
port = ...)
# For Oracle and BigQuery: define a schema that can be used to emulate temp tables.
# You should have write access to this schema:
oracleTempSchema <- NULL
# A folder on the local file system to store results:
outputFolder <- "..."
# The database schema where the observational data in CDM is located. For SQL Server
# this should include both the database and schema, for example 'cdm.dbo'.
# You should have read access to this schema:
cdmDatabaseSchema <- "cdm"
# The database schema where the cohorts can be instantiated. For SQL Server
# this should include both the database and schema, for example 'cdm.dbo'.
# You should have write access to this schema:
cohortDatabaseSchema <- "..."
# The name of the table that will be created in the cohortDatabaseSchema:
cohortTable <- "..."
# Some meta-data about your database. The databaseId is a short (<= 20 characters)
# name for your database. The databaseName is the full name, and databaseDescription
# provides a short (1 paragraph) description. These values will be displayed in the
# Shiny results app for all to see.
databaseId <- "..."
databaseName <- "..."
databaseDescription <- "..."
# This statement instatiates the cohorts, performs the diagnostics, and writes the results to
# a zip file containing CSV files. This will probaby take a long time to run:
runStudyDiagnostics(connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
cohortDatabaseSchema = cohortDatabaseSchema,
cohortTable = cohortTable,
oracleTempSchema = oracleTempSchema,
outputFolder = outputFolder,
databaseId = databaseId,
databaseName = databaseName,
databaseDescription = databaseDescription,
createCohorts = TRUE,
runInclusionStatistics = TRUE,
runTimeDistributions = TRUE,
runBreakdownIndexEvents = TRUE,
runIncidenceRates = TRUE,
runCohortOverlap = TRUE,
runCohortCharacterization = TRUE,
runTemporalCohortCharacterization = TRUE,
minCellCount = 5)
# (Optionally) to view the results locally:
CohortDiagnostics::preMergeDiagnosticsFiles(file.path(outputFolder, "diagnosticsExport"))
CohortDiagnostics::launchDiagnosticsExplorer(file.path(outputFolder, "diagnosticsExport"))
- Upload results to the Data Transfer Zone.
Ready to run.