This repo contains a python script which takes data from a validated CCDI submission manifest and creates dbGaP submission files specifically for a CCDI project.
A controlled virtual environment of Python is always recommanded for running any python package/script due to dependency management purpose. There are many tools that you can use to create a virtual environment, such as pyenv
, virtualenv
or conda
. An instruction is included here on how to create a conda env
with all the dependencies installed.
-
Conda install
Conda is an open source package management system and environment management system that runs on Windows, macOs, and Lunix. Here is the site of installation instruction. Please pick the right package based on your operation system.
-
Create a conda env
An environment yaml
conda_environment.yml
can be be found under folderenvs/
. To create the environment, simply runconda env create -f <path_to_env_yml>
You should be able to find an environment called
CCDI_to_dbGaP_env
when you runconda env list
-
Activate conda environment
All the dependecies that the script requires should be succesfully installed within this environment. To activate the environemnt, simply run
conda activate CCDI_to_dbGaP_env
You should be able to see
(CCDI_to_dbGaP_env)
at the begining of your terminal prompt line after activation. -
Deactivate conda environment
conda deactivate
❗Note: THIS SCRIPT assumes all CONSENT to be 👉 GRU (consent number to be 1). If a CONSENT other than GRU is found, data submitter is required to fix the CONSENT encoded value in SC_DD.xlsx before submission
>> python CCDI_to_dbGaPy.py --help
usage: CCDI_to_dbGaPy.py [-h] -f FILE [-s PREVIOUS_SUBMISSION]
This script is a python version to generate dbGaP submission files using a validated CCDI
submission manifest
required arguments:
-f FILE, --file FILE A validated dataset file based on the template
CCDI_submission_metadata_template (.xlsx)
optional arguments:
-s PREVIOUS_SUBMISSION, --previous_submission PREVIOUS_SUBMISSION
A previous dbGaP submission folder for the same phs_id study.
-
Inputs
The script requires a validated
CCDI manifest
. The previous SRA submission folder is optional. -
Outputs
- A log file named in
CCDI_to_dbGaP_<today_date>.log
- (If the script finishes successfully) A folder named in
<phs_id>_dbGaP_submission_<today_date>
.aviator_falsetto_6_dbGaP_submission_2023-11-24/ ├── SA_DD.xlsx ├── SA_DS_aviator_falsetto_6_dbGaP_submission.txt ├── SC_DD.xlsx ├── SC_DS_aviator_falsetto_6_dbGaP_submission.txt ├── SSM_DD.xlsx ├── SSM_DS_aviator_falsetto_6_dbGaP_submission.txt └── metadata.json 1 directory, 7 files
- A log file named in