This week starting days is dedicated to the common workflow language which allows for easier design and manipulation of tools in a workflow.CWL is well suited for describing large-scale workflows in cluster, cloud and high performance computing environments where tasks are scheduled in parallel across many nodes.
I started working with CWL with the help of docs and working on the example
which is there in the docs guide and some from CWl-example-repo.
Then I started working on simple example like fastq-dump
tool which is the intial step of the two-mappers-example.
- fastq.cwl :- This file contains the workflow. This file will be provided with
input
andoutput
and some base commands which will help to execute the process. Thedocker
is required in most of the process where this tool is installed or you can install the tool locally on your machine.
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: inutano/sra-toolkit
inputs:
sraFile:
type: File
inputBinding:
position: 1
baseCommand: [fastq-dump, --split-files, --skip-technical, --gzip]
outputs:
fastq:
type: stdout
stdout: $(inputs.out_fastq_prefix).fastq
- A yml file which have the path to the input or it can also have commands etc. So, I've given the path of the sra file
for the
fastq-dump
process to occur.
sraFile:
class: File
path: data/ERR045788.sra
out_fastq_prefix: _OUT_FASTQ_PREFIX_
Then second task is gunzip it
- gnuzip.cwl file :- Which will help to uncompress the gz.
cwlVersion: v1.0
class: CommandLineTool
baseCommand: [gunzip, -c]
inputs:
gzipfile:
type: File
inputBinding:
position: 1
outputs:
unzipped_vcf:
type: stdout
stdout: unzipped.vcf
- A
input.yml
file for providing the path
infile:
class: File
path: data/input.vcf.gz