Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpileup comparison #127

Closed
jgbradley1 opened this issue Mar 10, 2015 · 4 comments
Closed

mpileup comparison #127

jgbradley1 opened this issue Mar 10, 2015 · 4 comments
Assignees
Labels

Comments

@jgbradley1
Copy link

This is a great tool. I am trying to compare using mpileup with sambamba vs samtools. Here's what I've seen so far

samtools faidx ../human_g1k_v37.fa
time sambamba mpileup -t 20 -o ENCFF000CXI.sambamba.raw.bcf ../ENCFF000CXI.star.bam --samtools -gIf ../human_g1k_v37.fa
real 168m9.613s
user 121m13.895s
sys 2m7.975s

and on samtools, the real time was 62 minutes (I cleared the console so I don't have exact numbers anymore). Why would sambamba mpileup take ~2 hours while samtools takes at most 1 hour? The bam file is 1.6 GB and I was running on a 40 core machine. I am using sambamba v0.5.1. I haven't come across any mpileup comparisons using sambamba so I don't know have a good basis for any expectations.

@lomereiter
Copy link
Contributor

<rant> Honestly, I have no idea why @pjotrp decided to include this usage scenario. </rant>
The tool was initially intended to be used with --bcftools argument, because it's calling that is really computationally intensive, and therefore overhead becomes almost negligible.

Each process of samtools reads a complete chromosome from disk, and thus it has to be provided with a big enough chunk of useful work (controlled by --buffer-size parameter, by default set to mere 4Mb which is not at all suited for you use case).

Also, BCF is not well supported yet, so the extra step of converting BCF into VCF is taken.

Summarizing the above, here's a list of what's to be done:

  • full BCF format support
  • clear documentation with examples
  • buffer size defaults should adjust to the use case

@jgbradley1
Copy link
Author

Ok, thanks for your suggestion. I've been trying to do a successful sambamba mpileup run and the threads always get stuck. I've had to kill every job.

The steps I've taken are first index the bam file, then run

sambamba mpileup --nthreads=8
--output-filename=ENCFF000CXI.sambamba.raw.vcf
--buffer-size 2000000000
../ENCFF000CXI.star.bam
--bcftools call -cV indels

With a 2GB buffer size, that should be more than enough I think. Sambamba creates all the threads and running "ps ux" shows that one of the threads generated by sambamba has the command

samtools mpileup /tmp/sambamba-fork-dnGgdy/33 -gu -l /tmp/sambamba-fork-dnGgdy/33.bed | bcftools call -cV indels

Initially all the threads do take up part of the CPU and MEM but quickly converge to 0% and never go away according to "ps ux". The file at /tmp/sambamba-fork-dnGgdy/33 is size 0 bytes and /tmp/sambamba-fork-dnGgdy/33.bed is 23 bytes.

I have also tried to just run the following command directly

samtools mpileup /tmp/sambamba-fork-dnGgdy/33 -gu -l /tmp/sambamba-fork-dnGgdy/33.bed

and samtools gets stuck. There seems to be a break down somewhere before the line of code in pileup.d that joins the threads because they never finish. Could there be a race condition somewhere? According to test_suit.sh, mpileup is never tested. I realize it is one of the newer features of sambamba so development may still be ongoing.

@pjotrp
Copy link
Member

pjotrp commented Mar 11, 2015

Sambamba mpileup is somewhat experimental and not well tested. It does a map-reduce using temporary files. I won't have time to look into it before April.

@pjotrp pjotrp added the bug label Mar 11, 2015
@pjotrp pjotrp self-assigned this Mar 11, 2015
lomereiter added a commit that referenced this issue Apr 11, 2015
@lomereiter
Copy link
Contributor

The multithreading issue should be fixed now, please check. I've made a number of other improvements to the tool, see v0.5.3 release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants