Goal: Understand structure of nf-file and key concepts (Channels, Processes, Operators)
Reference: https://training.nextflow.io/basic_training/intro/
- practical work through intro (hello_world.nf)
- customisation and adaptation
- add parameters: e.g. params.name ("Hello $name")
- add processes: to print "uname -a and process-ID"
- process directives: - [ ] control processes locally and globally (cpus, memory, container) - [ ] use container from dockerhub (global or process-wise)
- configuration: run on slurm
- workflow on github and test
nextflow run github.com/maxplanck-ie/testflow --params ... -with-apptainer
- Make your selection here: https://pizzamonalisa.de
Goals:
- employ other workflows with singularity, locally and with slurm (queue test)
Challenge: Can we make this work with singularity and slurm?
- Input: basecalled BAM (downsampled, drosophila)
- Output: alignment BAM + QC
- run wf_alignment (dm6) with apptainer (identify approrpiate container)
- Extensions?
- add process (e.g. samtools flagstat)
- send email upon completion
Challenge: Can we predict RNA modifications? (--> m6anet; skip all other analyses)
- Input: samplesheet.csv, pod5/fast5 (subsampled)
- Output: methylation calls: data.result.csv.gz
exchange workflows, test runs & final discussion
module load nextflow/23.10.0
# vanilla
nextflow run sources/workflows3.nf
# switch on debugging (for all processes)
nextflow run sources/workflows3.nf -process.debug
# show status for each task - rather than proces summary (defautl: -ansi-log true)
nextflow run sources/workflows3.nf -ansi-log false
# use with a specific singularity/apptained
nextflow run -with-apptainer docker://ubuntu:20.04 sources/tutorial.nf
# test runs *without* modification calling
nextflow run nf-core/nanoseq -profile test,singularity
# couldn't get to work
nextflow run ~/.nextflow/assets/epi2me-labs/wf-alignment --bam data/bam --references /data/repository/organisms/dm6_flybase_r6.12/genome_fasta -with-singularity ontresearch/wf-alignment -without-docker
Retrospective correction: better specify -profile singularity as defined in nextflow.config (see group1 below)
APPTAINER_DISABLE_CACHE=true
For epi2me workflows spaces had to be purged from the genome.fa.
APPTAINER_DISABLE_CACHE=true
nextflow run epi2me-labs/wf-alignment --bam data/bam --references data/genome -profile singularity
Including another process into the wf-alignment workflow.
process flagstat_extra {
label "wfalignment"
cpus 2
input:
tuple val(meta), path(bam), path(index)
output:
path "*.readstats_extra.tsv", emit: flagstats_extraout
script:
def sample_name = meta["alias"]
"""
samtools flagstat $bam > ${sample_name}.readstats_extra.tsv
"""
}
and under workflow pipeline
// get flagstat extra
statsextra = flagstat_extra(bam)
//under emit
flagstats_extraout = statsextra.flagstats_extraout
The changed forked repo is available under
https://github.com/WardDeb/wf-alignment
workflow.onComplete {
Pinguscript.ping_complete(nextflow, workflow, params)
sendMail(
to: 'myemail@hellothere.nl',
subject: 'GREEN LIGHT',
body: 'BONJOUR TOUT LE MONDE!',
attachment: 'output/wf-alignment-report.html'
)
}
- prepare data directory: rna_data/fast5/
- link reference files: genome.fa, genome.fa.fai, genes.gtf
- prepare config with singularity enabled and sufficient memory: group2/nextflow.config
- prepare parameter file with nf-core launch (avoid long command lines --params.): group2/nf-params.json
- postprocess nf-params.json for some futher adjustment (e.g. "skip_multiplexing": true")
- run
module load nextflow/23.10.0
nextflow run nf-core/nanoseq -r 3.1.0 -profile singularity -params-file nf-params.json -resume
Conclusion: very slow run even for the small data set. Retrospective: the m6anet part of the workflow failed (after >5h!)
- update to include "slurm" profile definition: group2/nextflow.slurm.config
- run (on a node with qsub permissions)
module load nextflow/23.10.0
module load slurm
nextflow run nf-core/nanoseq -r 3.1.0 -profile slurm -params-file nf-params.json -resume
- if singularity images do not already exist at work/singularity, then nextflow will try to pull them with "singularity pull". This will fail on all nodes that don't have singularity installed.
- one possible solution is to pull down the repo and the singularity containers using nf-core (https://nf-co.re/tools#downloading-pipelines-for-offline-use), then we can execute the pipeline with slurm and singularity (not helpful as only 1 node can use singularity)
module load nextflow/23.10.0
module load slurm
nextflow run nf-core/nanoseq -r 3.1.0 -profile slurm,mamba -params-file nf-params.json -c nextflow.mamba.config -resume
- failed because certain idependency requirements could not be resolved by mamba/conda (e.g nanoplot, samtools, ncurses, ...)
- this would require to prepare a functional conda environment
-
video (RNA-seq with salmon) https://www.youtube.com/watch?v=1TbVpMjQUtU
-
gitpod: https://gitpod.io/#https://github.com/nextflow-io/training