Skip to content

CBIIT/ChildhoodCancerDataInitiative-Prefect_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChildhoodCancerDataInitiative-Prefect_Pipeline

This repo contains the source code for a Prefect workflow that is deployed in the ccdi-workspace of Prefect Cloud. The workflow performs data curation and validation based on a Childhood Cancer Data Initiative (CCDI) manifest, and outputs submission files for different platforms. This workflow is custom-made for CCDI study ingestion in an effort to simplify and expedite the data ingestion process in a standard manner. Please login to the Prefect Cloud to execute the workflow.

Contents


Workflow overlook

📌 This workflow expects a CCDI manifest in the latest version as input. The current workflow has been tested with CCDI data model v1.7.2

The current workflow contains 6 subflow/steps during execution. All these steps were modified based on previously developed Python/R scripts.

Prefect login instruction

  • Make sure you have received the invitation to join Prefect
  • Use your email address to get the login link or code login page
  • Navigate to ccdi-workspace ccdi_workspace

Execute a workflow

  • Find the deployment

    A Deployment is a server-side representation of a workflow. The deployment decides when, where, and how a workflow should run. ccdi-data-curation-deployment

  • Click Custom Run click-custom-run

  • Submit a workflow

    The only required two fields for deployment are file_path and runner.

    • The file_path is the path of the CCDI manifest in the s3 ccdi-validation bucket.
    • The runner is a uniq id of your choice. Please avoid space in your runner name. All the workflow outputs will be stored in the s3 bucket under the folder /<your_runner_id>. deployment_inputs
  • Check flow run flow_run

  • Check subflow subflow_run

Workflow outputs

If finished successfully, the outputs of your workflow will be uploaded to s3 bucket (ccdi-validation) under the folder <your_runner_id>/<phs_accession>_outputs_<date>_T<time>. The outputs of all workflows from the same runner can be found under /<your_runner_id> folder. workflow_outputs

Download workflow outputs

Use AWS CLI to download the entire workflow output folder to your local computer. Make sure you have your aws credential that was set up properly. You can check your credentials in credentials file under ~/.aws folder.

Run the command line below in your terminal to download the workflow outputs.

aws s3 cp s3://ccdi-validation/<your_runner_id>/ ./<your-runner-id>/ --recursive