This recipe shows how to run a proof-of-concept RNA-Seq pipeline on a single node using Azure Batch. This recipe uses task dependencies to ensure each step of the pipeline is executed in the proper order and only if each step completes successfully.
Please see refer to this set of sample configuration files for this recipe.
Pool properties such as publisher
, offer
, sku
, vm_size
and
vm_count
should be set to your desired values.
Because this example requires data files which are not part of the Docker
images and need to be shared between tasks, we will utilize resource_files
at the pool-level to ingress required data. Normally, shared_data_volumes
in the global configuration would be used for this purpose (e.g., NFS or
objects in Azure Storage).
The global configuration should set the following properties:
batch_shipyard:
storage_account_settings: mystorageaccount
global_resources:
docker_images:
- quay.io/biocontainers/bowtie2:2.3.4.3--py36h2d50403_0
- quay.io/biocontainers/tophat:2.1.1--py27_1
- quay.io/biocontainers/cufflinks:2.2.1--py36_2
The docker_images
array contains references to each of the tools
required for the sequencing pipeline.
The pipeline will take advantage of task dependencies to ensure tasks are processed one after the next after successful completion of each. Please see the jobs configuration for the full example.