allow starting the workflow with existing read alignments (sorted BAM/CRAM files) #92

gpertea · 2023-03-03T14:55:52Z

There are situations where users want to change e.g. featureCounts options, or bring in already prepared read alignments (sorted BAMs) so in such cases it would benefit to have the option to skip the HISAT2/STAR alignment step and proceed with the given alignments as "input" for the other steps in the pipeline.

I am aware this would involve skipping any steps that depend on the FASTQ files (which means not (re)generating the rse_tx object, and not including any fastqc metrics in colData etc.). However, there are ways to generate the rse_tx object from the BAM files (I can help with implementing that option)

It seems that the BIOCMap workflow was in part split into 2 nextflow scripts for a similar reason, if I am not mistaken. A similar interim/simpler solution might the way to address this request initially - create an alternate workflow besides main.nf that would work on BAM (or CRAM) files and run only the steps related to the read alignment data (featureCounts etc.), (with an option to built rse_tx from the provided alignment data).

The text was updated successfully, but these errors were encountered:

gpertea · 2023-03-03T14:56:46Z

I can help with most of the shell and R code necessary to implement this alternate workflow (as I already have some non-user-friendly scripts doing that), but I would need some help with the nextflow code/implementation.

Nick-Eagles · 2023-03-03T15:21:19Z

BiocMAP was split at "the same point" mostly on the thought that GPUs (used for alignment) might not be available on the same machines where massive CPU/memory resources (used for post-alignment steps) was available. I'm a bit concerned that there are too many ways a user might want to partially run SPEAQeasy (e.g. run transcript quantification again but not alignment, only call variants, etc), and this would be only one specific solution (and unfortunately Nextflow doesn't support this type of partial-running functionality without modifying/adding a lot of code). That said, if starting from aligned files is a repeated use case you're seeing, I can help out.

gpertea · 2023-03-03T15:53:04Z

Thank you Nick - perhaps the easiest approach at this point would be to help me put together a cut-down version of main.nf that can take as input the BAM files (different samples.manifest? or just point to a directory with the sorted BAM files?) and then run only the branches of the workflow that depend on those alignments (we could even add another input to be the colData needed to (re) build rse_gene and rse_exon I suppose).

I can take care of the R scripts there (like create_count_objects.R) to make them ignore the transcript assays if they are not available etc. but the nextflow part itself was the problem for me - my limited experience with nextflow (and time constraints) prevented me from attempting this by myself.

gpertea added the enhancement New feature or request label Mar 3, 2023

lcolladotor added this to SPEAQeasy plans Nov 30, 2023

lcolladotor added this to the bioc v3.21 milestone Nov 30, 2023

lcolladotor moved this to Todo in SPEAQeasy plans Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow starting the workflow with existing read alignments (sorted BAM/CRAM files) #92

allow starting the workflow with existing read alignments (sorted BAM/CRAM files) #92

gpertea commented Mar 3, 2023

gpertea commented Mar 3, 2023

Nick-Eagles commented Mar 3, 2023

gpertea commented Mar 3, 2023

allow starting the workflow with existing read alignments (sorted BAM/CRAM files) #92

allow starting the workflow with existing read alignments (sorted BAM/CRAM files) #92

Comments

gpertea commented Mar 3, 2023

gpertea commented Mar 3, 2023

Nick-Eagles commented Mar 3, 2023

gpertea commented Mar 3, 2023