Skip to content

Latest commit

 

History

History
118 lines (84 loc) · 4.3 KB

exercise_01.md

File metadata and controls

118 lines (84 loc) · 4.3 KB

Exercise 01: Creating a WDL Workflow

In this exercise, trainees will learn how to write a single-task WDL workflow and how to use miniwdl to run this workflow locally.

Exercise Objective: Create a WDL workflow to capture the total number of reads in a fastq file using fastq-scan.

  • Part 1: Exploring fastq-scan to calculate total number of reads in a fastq file
  • Part 2: Writing a WDL task and workflow to capture this functionality

Part 1 - Exploring FASTQ-SCAN

1.1: From your training VM, launch an interactive docker container using the StaPH-B Docker Image for fastq-scan version 0.4.4: docker run --rm -it -v ~/wm_training/data/:/data staphb/fastq-scan:0.4.4.

1.2: Use the fastq-scan documentation and the read data within the container to write a one-liner that:

  • Calcaultes the total number of reads within a gzipped fastq file and
  • Writes this value (INT) to a file called TOTAL_READS

Part 2 - Writing a WDL Task and Workflow

2.1: Use the miniwdl run command to execute the hworld WDL workflow hosted in this repository:
$ miniwdl run ~/wm_training/wdl/workflows/wf_hworld.wdl -i ~/wm_training/data/exercise_01/hworld_inputs.json

2.2: Modify the workflow input file (~/wm_training/data/hworld/hworld_inputs.json) to print your name.

 $ cat ~/wm_training/wdl/data/hwrold/hworld_inputs.json
 {
  "hworld_workflow.name": "Kevin G. Libuit"
 }

2.3: Use the WDL workflow and task template files (~/wm_training/wdl/workflows/wf_template.wdl & ~/wm_training/wdl/tasks/wf_task.wdl) to write a single-task WDL workflow that takes in paired-end fastq files (read1 & read2) and uses fastq-scan to calcaulte the total reads within each fastq file:

Hints and Solutions

1.2 Hint

The total number of reads is captured as qc_stats.read_total in the fastq-scan output json file. The jq is a powerful resources included in the staphb/fastq-scan:0.4.4 Dockerfile capable of parsing JSON files for specific outputs.

Check out the fastq-scan StaPH-B Docker Builds README.md before seeing the final solution!

1.2 Solution

One approach could be to concatenate the gzipped fastq file with zcat, pipe it into fastq-scan, and then pipe fastq-scan json output into the jq tool to query for qc_stats.read_total:

$ zcat {read_file} | fastq-scan | jq .qc_stats.read_total > TOTAL_READS

2.2 Hint

How does the hworld_inputs.json file define the name input attribute?

2.2 Solution

By modifying the string "Kevin G. Libuit" the input file can be modified to print any name, e.g.:

 $ cat ~/wm_training/wdl/data/hwrold/hworld_inputs.json
 {
  "hworld_workflow.name": "John Doe"
 }
2.3 Hint

Here's a potential start to task_fastq_scan.wdl file:

task fastq_scan_task {
  meta {
    # task metadata
    description: "Task to run fastq_scan"
  }
  input {
    # task inputs
    File read1
    File read2
    String docker = "staphb/fastq-scan:0.4.4"
    Int cpu = 2
    Int memory = 2
  }

With these input attributes, how can we construct a command block to execute the appropriate fastq-scan command? What information needs to be defined in the runtime block?

2.3 Solution

Check the following files in the solutions branch of this repository: