Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up merge_individual_BAM_files #707

Open
MilosCRF opened this issue Oct 17, 2024 · 3 comments
Open

Speed up merge_individual_BAM_files #707

MilosCRF opened this issue Oct 17, 2024 · 3 comments

Comments

@MilosCRF
Copy link

MilosCRF commented Oct 17, 2024

Hi Felix,

I’m currently running Bismark on AWS using 192 CPUs. Everything is running at lightning speed except for merging the temporary BAM files. It seems that the merging process is not utilizing multiple cores with the current bismark settings.

Do you know of any way to speed up this step? Perhaps adding -@ $num_threads to the samtools command that is likely handling the merging?

@FelixKrueger
Copy link
Owner

samtools cat does have a -@ flag, but I am not sure if this would at any point break the order of Read1/Read2 following each other directly (which would break downstream processed). Do you have possibility to find the command that merges the BAM files and try out adding increasing values of -@?

@MilosCRF
Copy link
Author

Thank you, Felix.
Yes, I can definitely test it. However, I couldn't find samtools cat in Bismark's main script. Isn't the merge accomplished by samtools view at line 1427?

open (OUT,"| $samtools_path view -bSh 2>/dev/null - > ${output_dir}${merged_name}") or die "Failed to write to $merged_name: $!\n";

@FelixKrueger
Copy link
Owner

I think you are right, I must have confused this with something else (e.g. deduplication of multiple files).

Can you try to try a different number of threads? E.g. 2, 4, 8? We only have to be mindful how this would affect resource allocation for e.g. nf-core workflows, but this would be a downstream problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants