This repo contains the source code for a Prefect workflow that is deployed in the ccdi-workspace
of Prefect Cloud. The workflow performs data curation and validation based on a Childhood Cancer Data Initiative (CCDI) manifest, and outputs submission files for different platforms. This workflow is custom-made for CCDI study ingestion in an effort to simplify and expedite the data ingestion process in a standard manner. Please login to the Prefect Cloud to execute the workflow.
- Workflow overlook
- Prefect login instruction
- Exucute a workflow
- Workflow outputs
- Download workflow outputs
📌 This workflow expects a CCDI manifest in the
latest
version as input. The current workflow has been tested with CCDI data model v1.7.2
The current workflow contains 6 subflow/steps during execution. All these steps were modified based on previously developed Python/R scripts.
- Make sure you have received the invitation to join Prefect
- Use your email address to get the login link or code
- Navigate to
ccdi-workspace
-
Find the deployment
A Deployment is a server-side representation of a workflow. The deployment decides when, where, and how a workflow should run.
-
Submit a workflow
The only required two fields for deployment are
file_path
andrunner
.
If finished successfully, the outputs of your workflow will be uploaded to s3 bucket (ccdi-validation) under the folder <your_runner_id>/<phs_accession>_outputs_<date>_T<time>
. The outputs of all workflows from the same runner can be found under /<your_runner_id>
folder.
Use AWS CLI to download the entire workflow output folder to your local computer. Make sure you have your aws credential that was set up properly. You can check your credentials in credentials
file under ~/.aws
folder.
Run the command line below in your terminal to download the workflow outputs.
aws s3 cp s3://ccdi-validation/<your_runner_id>/ ./<your-runner-id>/ --recursive