Parallel processing of paired-end fastq files #195

PramodRaoB · 2024-10-14T06:46:21Z

From my understanding of the code, in paired-end processing, both fastq input files could be processed parallelly, but they are currently sequential. Is this correct? If so, I could take this feature up.

FelixKrueger · 2024-10-14T10:46:01Z

Yes, it this is correct. If the trimming could occur in parallel it might indeed speed up the run-time, which sill allowing the the same flexibility further downstream. It might have to be taken care of for resource allocation later on, e.g. on nf-core, but this would be a subsequent step. If you want to have a go at it - that could be nice!

PramodRaoB · 2024-11-06T13:18:00Z

I was thinking of a different workflow that would reduce the execution time by removing some I/O overhead.

Process both of the paired-end files parallelly by cutadapt and have both the outputs streamed. Then, combine the trimming and validation phase and write the final output to disk directly. Since Cutadapt scales well for thread count less than 8, processing them in parallel (with half the original thread-count) would still give the same effective runtime. But, this would avoid writing the intermediary trimmed outputs (which are potentially compressed leading to even higher runtimes) thereby giving a lower overall runtime.

Let me know if this sounds good @FelixKrueger ! Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel processing of paired-end fastq files #195

Parallel processing of paired-end fastq files #195

PramodRaoB commented Oct 14, 2024

FelixKrueger commented Oct 14, 2024

PramodRaoB commented Nov 6, 2024

Parallel processing of paired-end fastq files #195

Parallel processing of paired-end fastq files #195

Comments

PramodRaoB commented Oct 14, 2024

FelixKrueger commented Oct 14, 2024

PramodRaoB commented Nov 6, 2024