MarkDup on Merge error #169

ACEnglish · 2015-09-28T21:17:04Z

I have a pipeline that will use your merge/markdups at the same time (awesome feature, btw) from v0.5.8.
However, I pipe that output into calmd to repopulate the md/nm tags.

The command I use is:

 sambamba markdup -t 4 -p --hash-table-size=65536 --tmpdir=/space1/tmp/$PBS_JOBID `cat MergeBams.txt` /dev/stdout | samtools calmd -b - hg19.fa > mergeDup.bam

However, I'm getting an error

 sambamba-markdup: Unable to write to stream

Is this a problem with piping in a way that sambamba doesn't agree with? I was able to validate the mergeDup.bam, and it appears to have all the reads in it, so I think everything executed fine, it seems it just couldn't clean up in the end.

The text was updated successfully, but these errors were encountered:

lomereiter · 2015-09-29T06:27:31Z

Please use -l 0 when it's in a pipe to avoid unnecessary compression (I'm contemplating if there's a simple way to do that automagically b/c users forget this all the time.)
Since you are running it with -p flag, what does it output to stderr prior to exit?
Does it work without the piping?
What was your reasoning behind making --hash-table-size less than default? On a large dataset you would want to do the opposite and increase --overflow-list-size as well.

ACEnglish · 2015-09-29T16:20:54Z

Thanks for the tip!
The stderr is muddled by samtools calmd output, but grepping that out gives us:

   finding positions of the duplicate reads in the file
   [============================================================]
   sorting 421221672 end pairs...   done in 61004 ms
   sorting 6699488 single ends (among them 0 unmatched pairs)... done in 380 ms
   collecting indices of duplicate reads...   done in 13898 ms
   found 90859212 duplicates, sorting the list...   done in 2382 ms
   collected list of positions in 139 min 28 sec
   marking duplicates...
   sambamba-markdup: Unable to write to stream

I'm re-running now, but I do believe it works fine since the bam that's output from even the failed job validates via picard. I'll let you know when I get those results back.
I'm not sure.. I'll have to ask around to see who made that decision and why.

cviner · 2015-10-10T00:46:03Z

I also recently encountered this error. It occurred without my piping anything into nor out of markdup.

In my case, this appears to be caused by a large --io-buffer-size (but well below my available RAM). In particular, use of --io-buffer-size {4096, 2048} all resulted in this error. However, the error did not occur when this argument is not provided and also did not occur for explicitly-provided smaller values (all of --io-buffer-size {256, 512, 1024} completed without any reported errors).

ACEnglish · 2015-10-12T16:07:30Z

I've tried running the commands separately and changed the parameters as suggested, but am still getting an error.

 finding positions of the duplicate reads in the file
 [===================================================================]
 sorting 530312331 end pairs...   done in 60231 ms
 sorting 18641379 single ends (among them 0 unmatched pairs)... done in 723 ms
 collecting indices of duplicate reads...   done in 15464 ms
 found 189989161 duplicates, sorting the list...   done in 4116 ms 
 collected list of positions in 642 min 3 sec
 marking duplicates...
 [                                                                              ]
 sambamba-markdup: Unable to write to stream

Here's the command

 sambamba markdup -l 0 -t 4 -p --tmpdir=/space1/tmp/ `cat MergeBams.txt` /space1/tmp//merge.bam

All of the inputBams passed bamUtil validation. So I'm not sure what else to try. Is there any possibility that there is a problem with the merge/mark command?

lomereiter · 2015-10-12T17:37:31Z

Hi, could you please tell how many temporary files were created in /space1/tmp?

ACEnglish · 2015-10-13T18:53:27Z

I've only been able to run a single test. I saw a peak of ~600 temporary files being created by sambamba.

To watch the files, I reassigned the tmpdir to somewhere on cluster storage (TMPDIR is local node storage) - What's weird is that this time it ran just fine. It seems that sambamba isn't consistently failing, so the problem seems more likely to be on my end.

My next question is since you were asking about temporary files, is there some sort of limit I should be looking for in my TMPDIR of temporary files?

lomereiter · 2015-10-13T19:11:27Z

Could it be that your local node storage simply runs out of space? With such amount of duplicates it's quite possible.

I was asking because number of simultaneously open file descriptors is usually limited, see e.g. #118 - but that would lead to another error message, so it doesn't seem to be the case here.

ACEnglish · 2015-10-14T18:47:07Z

It looks like temp space may be what's holding it up. Thank you for your feedback!

lomereiter · 2017-03-20T05:50:52Z

@FrankFeng thanks, it turned out that the number was multiplied by the number of input files.

ACEnglish closed this as completed Oct 14, 2015

lomereiter pushed a commit that referenced this issue Mar 20, 2017

markdup: allocate i/o buf. according to docs (#169)

f15c248

sklages mentioned this issue May 1, 2021

sambamba markdup: Unable to write to stream #475

Closed

gforg34 mentioned this issue Apr 19, 2024

sambamba-markdup：(No space left on device) #509

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MarkDup on Merge error #169

MarkDup on Merge error #169

ACEnglish commented Sep 28, 2015

lomereiter commented Sep 29, 2015

ACEnglish commented Sep 29, 2015

cviner commented Oct 10, 2015

ACEnglish commented Oct 12, 2015

lomereiter commented Oct 12, 2015

ACEnglish commented Oct 13, 2015

lomereiter commented Oct 13, 2015

ACEnglish commented Oct 14, 2015

lomereiter commented Mar 20, 2017

MarkDup on Merge error #169

MarkDup on Merge error #169

Comments

ACEnglish commented Sep 28, 2015

lomereiter commented Sep 29, 2015

ACEnglish commented Sep 29, 2015

cviner commented Oct 10, 2015

ACEnglish commented Oct 12, 2015

lomereiter commented Oct 12, 2015

ACEnglish commented Oct 13, 2015

lomereiter commented Oct 13, 2015

ACEnglish commented Oct 14, 2015

lomereiter commented Mar 20, 2017