In this exercise, trainees will learn how to write a single-task WDL workflow and how to use miniwdl to run this workflow locally.
Exercise Objective: Create a WDL workflow to capture the total number of reads in a fastq file using fastq-scan.
- Part 1: Exploring fastq-scan to calculate total number of reads in a fastq file
- Part 2: Writing a WDL task and workflow to capture this functionality
1.1: From your training VM, launch an interactive docker container using the StaPH-B Docker Image for fastq-scan version 0.4.4: docker run --rm -it -v ~/wm_training/data/:/data staphb/fastq-scan:0.4.4
.
1.2: Use the fastq-scan documentation and the read data within the container to write a one-liner that:
- Calcaultes the total number of reads within a gzipped fastq file and
- Writes this value (INT) to a file called
TOTAL_READS
2.1: Use the miniwdl run
command to execute the hworld
WDL workflow hosted in this repository:
$ miniwdl run ~/wm_training/wdl/workflows/wf_hworld.wdl -i ~/wm_training/data/exercise_01/hworld_inputs.json
2.2: Modify the workflow input file (~/wm_training/data/hworld/hworld_inputs.json
) to print your name.
$ cat ~/wm_training/wdl/data/hwrold/hworld_inputs.json
{
"hworld_workflow.name": "Kevin G. Libuit"
}
2.3: Use the WDL workflow and task template files (~/wm_training/wdl/workflows/wf_template.wdl
& ~/wm_training/wdl/tasks/wf_task.wdl
) to write a single-task WDL workflow that takes in paired-end fastq files (read1
& read2
) and uses fastq-scan
to calcaulte the total reads within each fastq file:
1.2 Hint
The total number of reads is captured as qc_stats.read_total
in the fastq-scan
output json file. The jq
is a powerful resources included in the staphb/fastq-scan:0.4.4
Dockerfile capable of parsing JSON files for specific outputs.
Check out the fastq-scan StaPH-B Docker Builds README.md before seeing the final solution!
1.2 Solution
One approach could be to concatenate the gzipped fastq file with zcat
, pipe it into fastq-scan, and then pipe fastq-scan json output into the jq
tool to query for qc_stats.read_total
:
$ zcat {read_file} | fastq-scan | jq .qc_stats.read_total > TOTAL_READS
2.2 Hint
How does the hworld_inputs.json file define the name
input attribute?
2.2 Solution
By modifying the string "Kevin G. Libuit"
the input file can be modified to print any name, e.g.:
$ cat ~/wm_training/wdl/data/hwrold/hworld_inputs.json
{
"hworld_workflow.name": "John Doe"
}
2.3 Hint
Here's a potential start to task_fastq_scan.wdl
file:
task fastq_scan_task {
meta {
# task metadata
description: "Task to run fastq_scan"
}
input {
# task inputs
File read1
File read2
String docker = "staphb/fastq-scan:0.4.4"
Int cpu = 2
Int memory = 2
}
With these input attributes, how can we construct a command
block to execute the appropriate fastq-scan
command? What information needs to be defined in the runtime
block?