Skip to content

Commit

Permalink
data
Browse files Browse the repository at this point in the history
  • Loading branch information
nmercadeb committed Sep 9, 2024
1 parent 3592b57 commit 12c5ccc
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 19 deletions.
Binary file modified R/sysdata.rda
Binary file not shown.
6 changes: 3 additions & 3 deletions extras/getBenchmarkResults.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,6 @@ mergeData <- function(data, patterns) {
return(x)
}

result_patterns <- c("time", "comparison", "details", "omop", "index_counts", "sql_indexes")
data <- readData(here("extras", "data")) %>% mergeData(result_patterns)
save(data, file = here("extras", "benchmark.RData"))
resultPatterns <- c("time", "comparison", "details", "omop", "index_counts", "sql_indexes")
benchmarkData <- readData(here("extras", "data")) %>% mergeData(resultPatterns)
usethis::use_data(benchmarkData, internal = TRUE, overwrite = TRUE)
26 changes: 10 additions & 16 deletions vignettes/a11_benchmark.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,7 @@ library(dplyr)
library(tidyr)
library(gt)
library(scales)
```
```{r}
# Functions
niceNum <- function(x, dec = 0) {
trimws(format(round(as.numeric(x), dec), big.mark = ",", nsmall = dec, scientific = FALSE))
Expand Down Expand Up @@ -63,16 +61,12 @@ niceOverlapLabels <- function(labels) {
)
)
}
```
```{r}
# Results
createRData <- FALSE
if (createRData) {
source(here::here("extras", "getBenchmarkResults.R"))
} else {
load(here::here("extras", "benchmark.RData"))
}
}
```

# Introduction
Expand All @@ -95,7 +89,7 @@ Current results were obtained from a 100,000-person sample of the CPRD GOLD data

The table below show how many records are in OMOP tables used in the benchmark script for each participating database.
```{r}
data$omop |>
benchmarkData$omop |>
filter(table_name != "death") |>
select("cdm_name", "OMOP table" = "table_name", "number_records") |>
mutate(
Expand Down Expand Up @@ -128,7 +122,7 @@ The COVID-19 cohort was used to evaluate the performance of common cohort strati
The following table displays the number of records and subjects for each cohort across the participating databases:

```{r}
data$details |>
benchmarkData$details |>
filterSettings(result_type == "cohort_count") |>
tidy(addSettings = FALSE) |>
select(-variable_level, - result_id) |>
Expand Down Expand Up @@ -172,7 +166,7 @@ data$details |>
We also computed the overlap between patients in CIRCE and CohortConstructor cohorts, with results shown in the plot below:

```{r, fig.width=10, fig.height=7}
overlap <- data$comparison |>
overlap <- benchmarkData$comparison |>
filterSettings(result_type == "cohort_overlap")
overlap |>
Expand Down Expand Up @@ -213,7 +207,7 @@ The following plot shows the times taken to create each cohort using CIRCE and C
## TABLE with same results as the plot below.
# header_prefix <- "[header]Time by database (minutes)\n[header_level]"
# data$time |>
# benchmarkData$time |>
# distinct() |>
# filter(!grepl("male|set", msg)) |>
# mutate(
Expand Down Expand Up @@ -246,7 +240,7 @@ The following plot shows the times taken to create each cohort using CIRCE and C

```{r, fig.width=8, fig.height=7}
data$time |>
benchmarkData$time |>
distinct() |>
filter(!grepl("male|set", msg)) |>
mutate(
Expand Down Expand Up @@ -290,15 +284,15 @@ The table below depicts the total time it took to create the nine cohorts when u

```{r}
header_prefix <- "[header]Time by tool (minutes)\n[header_level]"
data$time |>
benchmarkData$time |>
distinct() |>
filter(grepl("atlas", msg)) |>
filter(!grepl("male", msg)) |>
group_by(cdm_name) |>
summarise(time = niceNum(sum(as.numeric(toc) - as.numeric(tic))/60, 2)) |>
mutate(Tool = "CIRCE") |>
union_all(
data$time |>
benchmarkData$time |>
filter(msg == "cc_set_no_strata") |>
group_by(cdm_name) |>
summarise(time = niceNum(sum(as.numeric(toc) - as.numeric(tic))/60, 2)) |>
Expand All @@ -318,7 +312,7 @@ data$time |>
Cohorts are often stratified in studies. With Atlas cohort definitions, each stratum requires a new CIRCE JSON to be instantiated, while CohortConstructor allows stratifications to be generated from an overall cohort. The following table shows the time taken to create age and sex stratifications for the COVID-19 cohort with both CIRCE and CohortConstructor.

```{r}
data$time |>
benchmarkData$time |>
distinct() |>
filter(grepl("atlas_covid|set_strata", msg) | msg == "cc_covid") |>
filter(msg != "atlas_covid") |>
Expand Down Expand Up @@ -351,7 +345,7 @@ Four calls were made to `conceptCohort`, each involving a different number of OM
The plot below shows the computation time with and without SQL indexes for each scenario:

```{r, fig.width=8, fig.height=7}
data$sql_indexes |>
benchmarkData$sql_indexes |>
distinct() |>
group_by(cdm_name, msg) |>
summarise(time = sum(as.numeric(toc) - as.numeric(tic))/60, .groups = "drop") |>
Expand Down

0 comments on commit 12c5ccc

Please sign in to comment.