Skip to content

Latest commit

 

History

History
92 lines (76 loc) · 4.24 KB

README.org

File metadata and controls

92 lines (76 loc) · 4.24 KB

Multigroup Empirical Equipoise

This is the simulation code repository for our upcoming paper (Pharmacoepidemiol Drug Saf 2019) on multigroup empirical equipoise assessment. We extended the original definition (Walker et al. Comp Eff Res 2013;3:11) to the multigroup setting.

Related resources

  • Paper: PDS 2019 in press

Files and folders in this repo

  • *.R: Main R script files for generating simulation data, analyzing data, and reporting results. Execution each file will generate a plain text report file named *.R.txt under log/.
  • *_o2.sh: Example shell scripts for the Linux SLURM batch job system. These are designed for Harvard Medical School’s O2 cluster specifically, and are not expected to work without modification elsewhere.
  • data/: Folder for simulation data. Due to file size issues, only the summary file for running 06_assess_results.R is kept.
  • log/: Folder for log files.
  • out/: Folder for figure PDFs.

Simulation

Installation

Two custom R packages must be installed before running the R scripts provided in this repository.

## Install devtools (if you do not have it already)
install.packages("devtools")
## Install the data generation package
devtools::install_github(repo = "kaz-yos/datagen3")
## Install the simulation package
devtools::install_github(repo = "kaz-yos/empeq3")
## Install the simulation package
devtools::install_github(repo = "kaz-yos/trim3")

Additionally, packages doParallel, doRNG, grid, gtable, and tidyverse, are required in the scripts.

Data generation

Running the following will generate raw data files under the data/ folder using 8 cores.

Rscript ./01_generate_data.R 8

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script.

sh ./01_generate_data_o2.sh

Data preparation

Running the following will process the specified data file (PS estimation and trimming) and generate a new file with the same name except that raw changes to prepared.

Rscript ./02_prepare_data.R ./data/scenario_raw001_part001_r50.RData 8

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script. This will dispatch a SLURM job for each file.

sh ./02_prepare_data_o2.sh ./data/scenario_raw*

Data analysis

Running the following will analyze the specified data file (outcome model estimation) and generate a new file with the same name except that prepared changes to analyzed.

Rscript ./03_analyze_data.R ./data/scenario_prepared001_part001_r50.RData 8

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script. This will dispatch a SLURM job for each file.

sh ./03_analyze_data_o2.sh ./data/scenario_prepared*

Result aggregation

Running the following will aggregate all the analyzed data files under data/ and generate a single new file named all_analysis_results.RData.

Rscript ./04_aggregate_results.R 1

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script.

sh ./04_aggregate_results_o2.sh

Result summarization

Running the following will summarize the results (scenario-level summaries) and generate a new file named all_analysis_summary.RData.

Rscript ./05_summarize_results.R ./data/all_analysis_results.RData 1

If you have access to a SLURM-based computing cluster, the following can be used with appropriate modifications to the shell script.

sh ./05_summarize_results_o2.sh ./data/all_analysis_results.RData

Assessment

Running the following will create figures under out/ describing the summary statistics in all_analysis_summary.RData.

./Rscriptee ./06_assess_results.R ./data/all_analysis_summary.RData 1

Author

Kazuki Yoshida <kazukiyoshida@mail.harvard.edu>