markdup segmentation fault #393

ZhifeiLuo · 2019-03-28T22:55:22Z

sambamba 0.6.8 from bioconda

I run markdup on a cluster and generated the segmentation fault error and left a core.* temporary file. This error is consistent among all the computer with different CPU (either Xeon or AMD Opteron). Any thoughts?

steffenheyne · 2019-03-29T09:21:40Z

probably similar issue here. I'm also on conda v0.6.8

For test purposes I have a very small bam and it seems releated to the number of threads used.
The same small bam with -t 1..8 works fine, with -t 9 it segfaults
Maybe some threads get zero data!?

$sambamba markdup --remove-duplicates -t 9 in.bam out.bam

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
    LDC 1.13.0 / DMD v2.083.1 / LLVM7.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)

finding positions of the duplicate reads in the file...
  sorted 324798 end pairs
     and 324 single ends (among them 0 unmatched pairs)
  collecting indices of duplicate reads...   done in 57 ms
  found 361424 duplicates
collected list of positions in 0 min 0 sec
Segmentation fault (core dumped)

steffenheyne · 2019-03-29T12:59:05Z

ah maybe this is due to a high number of duplicates.
I increased the --hash-table-size=4194304 and then it runs through without segfault

steffenheyne · 2019-03-29T13:35:53Z

so actually the segfaults seem more random, I changed parameters and sometimes it crashes, sometimes it runs through...so far unpredictable what causes this....

bioconda version/build issue?
underlying OS is ubuntu 18.04

steffenheyne · 2019-03-29T14:57:10Z

here is a dump
catchsegv.1.txt

pjotrp · 2019-04-01T21:20:36Z

Thanks for the dump. I am happy to chase the problem. You can see it is happening in the garbage collector. Can you try the latest binary release and see if this does the same? https://github.com/biod/sambamba/releases

Also, what hardware are you on. Some Xeons have threading problems.

steffenheyne · 2019-04-02T14:26:03Z

different cloud servers, but last one was a Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz

dump from cloned master:

catchsegv.2.txt

I compiled with make debug, but looks less informative than with the conda version

pjotrp · 2019-04-02T14:29:26Z

I think that is one of the unreliable Xeons for hyperthreading. Sambamba is one of the rare tools which brings it out by utilizing all cores. See also #335. Check the list by Intel.

steffenheyne · 2019-04-02T14:41:47Z

mhmm that's bad. so using less cores could be more stable?

I also get a segfault with sambamba depth ....

pjotrp · 2019-04-02T15:08:45Z

One option is to turn off hyperthreading. Sadly that is hard in most HPC environments. Alternatively don't use those machines or rerun until it gets through. You can blame Intel.

steffenheyne · 2019-04-03T08:47:14Z

yeah thanks!

Btw, I downgraded to v0.6.6 via conda and this seems more stable, so far I don't get any segfaults with the same data.

Does this version uses a different underlying library or like that?

pjotrp · 2019-04-03T10:54:10Z

It is possible. What version of LLVM and ldc is it using? Underlying tools are evolving fast. Sambamba did not change markdup essentially between those versions.

steffenheyne · 2019-04-03T11:51:32Z

It looks like the 0.6.6 conda version uses your pre-build packages from github
(https://github.com/biod/sambamba/releases/download/v0.6.6/sambamba_v0.6.6_linux.tar.bz2),

whereas the 0.6.8. build it from scratch on the conda server as given here:
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/sambamba/build.sh

pjotrp · 2019-04-03T11:55:20Z

Can you try running the pre-built version for 0.6.9?

steffenheyne · 2019-04-03T12:28:41Z

unfortunately this doesn't help, the pre-build 0.6.9 also segfaults, and also a locally build 0.7.0-pre1

but what I observed I rarely/(never?) got segfaults with all versions from 1-8 threads, strangely also not with the local build with 9 threads, higher thread numbers really often/(always?) segfaults

steffenheyne · 2019-04-03T12:44:52Z

yeah with your manually (not using conda) downloaded pre-build 0.6.6 it also (so far) never crashes with whatever number of threads

RichardCorbett · 2019-04-05T00:15:02Z

Hi folks,
I'm using sambamba-0.6.9-linux-static on CentOS 6 and 7 and getting what looks like a similar error.

I ran 40 different samples and only 10 got this error, but I tried this command on a few machines and they all seg faulted.

Do you have any suggestions?

thanks,
Richard

/projects/COLO829SomaticTitration/sambamba-0.6.9-linux-static markdup --tmpdir /projects/COLO829SomaticTitration /projects/COLO829SomaticTitration/normals/A36973_5_lanes_dupsFlagged.bam.spec.bam.p1.bam /projects/COLO829SomaticTitration/normals/A36973_5_lanes_dupsFlagged.bam.spec.bam.p1.bam.dupmark.bam

sambamba 0.6.9 by Artem Tarasov and Pjotr Prins (C) 2012-2019
    LDC 1.14.0 / DMD v2.084.1 / LLVM7.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)

finding positions of the duplicate reads in the file...
  sorted 147834456 end pairs
     and 4860622 single ends (among them 1137 unmatched pairs)
  collecting indices of duplicate reads...   done in 35457 ms
  found 2953009 duplicates
collected list of positions in 26 min 49 sec
marking duplicates...
Segmentation fault (core dumped)

pjotrp · 2019-04-05T01:26:38Z

@RichardCorbett what is the CPU you are on? Does using an older binary of sambamba work?

RichardCorbett · 2019-04-05T15:42:47Z

CPU tested:
Intel Xeon E5 -2650 @ 2.20 GHz

I just had some filesystem problems overnight so retesting is going to be a challenge. In case it may be relevant I noticed that I was testing on BWA aln 0.5.7 BAM file which includes non-zero mapping qualities for unaligned reads.

pjotrp · 2019-04-05T16:11:38Z

Yeah, this may be another Xeon with problems. It will be interesting to see if older versions of sambamba+llvm+ldc show the same problem.

@steffenheyne can you confirm you are still good with 0.6.6 binary? And both of you, is there any way I could access one of these machines so I can debug the problem? If 0.6.6 works I could use that same build chain to compile a new binary and see if that keeps going.

pjotrp · 2019-04-05T16:14:38Z

sambamba 0.6.6 was built with

    LDC 0.17.1
    using DMD v2.068.2
    using LLVM 3.8.0

pjotrp · 2019-04-05T16:16:14Z

Another thing we could try is building sambamba with the GNU D compiler. That would fine tune the diagnostics - whether it is with LLVM or D-runtime.

RichardCorbett · 2019-04-05T16:45:42Z

A second test run overnight duplicated the error on a machine with this cpu:
144 x Intel(R) Xeon(R) CPU E7-8867 v4 @ 2.40GHz

pjotrp · 2019-04-05T16:49:42Z

Great. I am starting to suspect the D runtime. I should try hitting one of our machines badly.

dpryan79 · 2019-04-09T14:41:26Z

I'll try rebuilding 0.6.6 in bioconda with more recent pinnings to see if (A) it still works without segfaulting and (B) it can still be combined with other recent packages in an environment. That will allow us (@steffenheyne and me) to get around the current issue. I agree that this is likely a D runtime issue.

pjotrp · 2019-04-11T01:15:22Z

Thanks. I am too busy now to sort it now but should have time by Easter.

sirselim · 2019-04-26T07:13:16Z

Just adding my observation of this issue as well.
Processors: 2x Intel Xeon Gold 5118 (12 core, based on Skylake)

Using conda Sambamaba (0.6.8 and now 0.6.9) in a snakemake workflow. Getting segfaults at markdup stage.

Set the markdup rule to call 0.6.6 from conda and things seem to be processing well.

pjotrp · 2019-04-26T14:08:19Z

Thanks. This is very annoying, good thing we have 0.6.6.

pjotrp · 2019-05-31T15:09:04Z

#13 at /home/wrk/izip/git/opensource/D/sambamba/BioD/bio/std/hts/bam/readrange.d:178

points at

            _alloc_buffer = uninitializedArray!(ubyte[])(max(size, 65536));

which triggers a GC sweep and segfaults.

The stack trace is informative and it looks like we have invalid pointers on the stack.

pjotrp · 2019-05-31T15:24:16Z

The debug version of sambamba renders

0x000000000070e814 in _D3std11parallelism8TaskPool16tryDeleteExecuteMFPSQBwQBv12AbstractTaskZv ()                                                              (gdb) bt                                                                                                                                                       #0  0x000000000070e814 in _D3std11parallelism8TaskPool16tryDeleteExecuteMFPSQBwQBv12AbstractTaskZv ()                                                          #1  0x000000000059a2a5 in _D3std11parallelism__T4TaskS_D3bio4core4bgzf5block19decompressBgzfBlockFSQBrQBqQBoQBm9BgzfBlockCQCoQCn5utils7memoize__T5CacheTQCcTSQD
xQDwQDuQDs21DecompressedBgzfBlockZQBwZQBpTQDzTQDgZQGf10yieldForceMFNcNdNeZQCz (this=0x2828282828282828)                                                 
    at /gnu/store/lfj9sx1c98nj65vw8gmvz31sh3q8qhm6-ldc-1.16.0-beta2/include/d/std/parallelism.d:605                                                            #2  0x0000000000599b0b in _D3std11parallelism__T4TaskS_D3bio4core4bgzf5block19decompressBgzfBlockFSQBrQBqQBoQBm9BgzfBlockCQCoQCn5utils7memoize__T5CacheTQCcTSQDxQDwQDuQDs21DecompressedBgzfBlockZQBwZQBpTQDzTQDgZQGf6__dtorMFNfZv (this=0x2828282828282828)                                                                       at /gnu/store/lfj9sx1c98nj65vw8gmvz31sh3q8qhm6-ldc-1.16.0-beta2/include/d/std/parallelism.d:747                                                            #3  0x000000000074064c in object.TypeInfo_Struct.destroy(void*) const ()
#4  0x000000000074a1f7 in rt.lifetime.finalize_array(void*, ulong, const(TypeInfo_Struct)) ()                                                                  #5  0x000000000074b456 in rt.lifetime.finalize_array2(void*, ulong) ()                                                                                         #6  0x000000000074b74a in rt_finalizeFromGC ()                                                                                                                 #7  0x000000000076c798 in _D2gc4impl12conservativeQw3Gcx5sweepMFNbZm ()                                                                                       
#8  0x0000000000767ff0 in _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm ()                        
#9  0x0000000000769f1c in _D2gc4impl12conservativeQw3Gcx8bigAllocMFNbmKmkxC8TypeInfoZPv ()                                                                     #10 0x0000000000765123 in _D2gc4impl12conservativeQw3Gcx5allocMFNbmKmkxC8TypeInfoZPv ()                                                                        #11 0x0000000000765059 in _D2gc4impl12conservativeQw14ConservativeGC12mallocNoSyncMFNbmkKmxC8TypeInfoZPv ()                                                    #12 0x0000000000764fa2 in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl ()                                                                                          #13 0x00000000007651e6 in _D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ ()
#14 0x0000000000768aa8 in _DThn16_2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ ()                                   #15 0x000000000073d3a1 in gc_qalloc ()                                                                                                                         #16 0x00000000007374ca in _D4core6memory2GC6qallocFNaNbmkxC8TypeInfoZSQBqQBo8BlkInfo_ ()                                                                       #17 0x00000000007499d7 in _D2rt8lifetime12__arrayAllocFNaNbmxC8TypeInfoxQlZS4core6memory8BlkInfo_ ()                                                           #18 0x000000000074ab37 in _d_newarrayU ()                                                                                                                      #19 0x000000000074abed in _d_newarrayT ()
#20 0x000000000051b07e in _D7contrib6undead6stream14BufferedStream6__ctorMFCQBwQBrQBn6StreammZCQCpQCkQCgQCc (this=0x7ffe8a37f380, source=0x7ffe8a30ce40,           bufferSize=134217728) at /home/wrk/izip/git/opensource/D/sambamba/BioD/contrib/undead/stream.d:1628
#21 0x000000000054f772 in bio.std.hts.bam.reader.BamReader.getNativeEndianSourceStream() (this=0x7ffe8a312d00)                                             
    at /home/wrk/izip/git/opensource/D/sambamba/BioD/bio/std/hts/bam/reader.d:517                          
...

where the last line refers to another allocation

    return new BufferedStream(file, _buffer_size);

pjotrp · 2019-05-31T15:58:48Z

This may be what I am looking for

#0  0x000000000070e814 in _D3std11parallelism8TaskPool16tryDeleteExecuteMFPSQBwQBv12AbstractTaskZv ()
#1  0x000000000059a2a5 in _D3std11parallelism__T4TaskS_D3bio4core4bgzf5block19decompressBgzfBlockFSQBrQBqQBoQBm9BgzfBlockCQCoQCn5utils7memoize__T5CacheTQCcTSQDxQDwQDuQDs21DecompressedBgzfBlockZQBwZQBpTQDzTQDgZQGf10yieldForceMFNcNdNeZQCz (this=0x2026261b0d812821)
    at /gnu/store/lfj9sx1c98nj65vw8gmvz31sh3q8qhm6-ldc-1.16.0-beta2/include/d/std/parallelism.d:605
#2  0x0000000000599b0b in _D3std11parallelism__T4TaskS_D3bio4core4bgzf5block19decompressBgzfBlockFSQBrQBqQBoQBm9BgzfBlockCQCoQCn5utils7memoize__T5CacheTQCcTSQDxQDwQDuQDs21DecompressedBgzfBlockZQBwZQBpTQDzTQDgZQGf6__dtorMFNfZv (this=0x2026261b0d812821)
    at /gnu/store/lfj9sx1c98nj65vw8gmvz31sh3q8qhm6-ldc-1.16.0-beta2/include/d/std/parallelism.d:747
#3  0x000000000074064c in object.TypeInfo_Struct.destroy(void*) const ()
#4  0x000000000074a1f7 in rt.lifetime.finalize_array(void*, ulong, const(TypeInfo_Struct)) ()
#5  0x000000000074b456 in rt.lifetime.finalize_array2(void*, ulong) ()
#6  0x000000000074b74a in rt_finalizeFromGC ()

pjotrp · 2019-06-01T01:23:51Z

Just now, for the first time, sambamba ran without crashing :). The problem is an out of order execution.

pjotrp · 2019-06-02T18:40:15Z

I think the problem is with scopedtask where a task is added to the threadpool using the stack rather than the heap. In particular this section

https://github.com/biod/BioD/blob/master/bio/core/bgzf/inputstream.d#L391

where a task gets created and pushed on a roundbuf. When the garbage collector kicks in after reading the BAM file it wants to destroy this object but it is in an inconsistent state (maybe the thread already got cleaned up or it tries to clean up twice). I managed to prevent segfaulting by disabling the garbage collector, but obviously that won't do.

The roundbuf is probably used to keep a task connected with the bgzf unpacking buffer. I am not sure why this is necessary. Also I am not convinced a threadpool is that much of a benefit for bgzf unpacking as the single threaded routine I wrote last year is blazingly fast. Need to figure out what the best approach is...

pjotrp · 2019-06-10T16:40:45Z

Adding this code to the destructor which segfaults

struct DecompressedBgzfBlock {
    ~this() {
      stderr.writeln("destroy DecompressedBgzfBlock ",start_offset,":",end_offset," ",decompressed_data.sizeof);
    };
    ulong start_offset;
    ulong end_offset;
    ubyte[] decompressed_data;
}

Running a decompress block typically reads

destroy DecompressedBgzfBlock 4945:5091 16                                                                                             
destroy DecompressedBgzfBlock 4616:4945 16
destroy DecompressedBgzfBlock 4287:4616 16                                                                                             destroy DecompressedBgzfBlock 3958:4287 16
destroy DecompressedBgzfBlock 3629:3958 16                                                                                             
destroy DecompressedBgzfBlock 3300:3629 16
destroy DecompressedBgzfBlock 2971:3300 16                                                                                             
destroy DecompressedBgzfBlock 2642:2971 16
destroy DecompressedBgzfBlock 2313:2642 16                                                                                             destroy DecompressedBgzfBlock 1984:2313 16  
destroy DecompressedBgzfBlock 1655:1984 16
destroy DecompressedBgzfBlock 1326:1655 16
destroy DecompressedBgzfBlock 997:1326 16
destroy DecompressedBgzfBlock 668:997 16  
destroy DecompressedBgzfBlock 339:668 16  
destroy DecompressedBgzfBlock 0:339 16

but before a segfault we get

destroy DecompressedBgzfBlock 3184080310709005360:3467820302580068384 16
destroy DecompressedBgzfBlock 3184080310725782576:3467820302580068384 16                                                               destroy DecompressedBgzfBlock 3467820298285101088:3467820319995076652 16
destroy DecompressedBgzfBlock 3184361785685716016:3539877896617996576 16                                                               destroy DecompressedBgzfBlock 2318280822927401004:3184080310725782576 16
destroy DecompressedBgzfBlock 3539878068416698400:2318281922439028780 16                                                               destroy DecompressedBgzfBlock 3467820302580068640:2318280822927401004 16
destroy DecompressedBgzfBlock 2318280822927401004:3184361785702493232 16                                                               destroy DecompressedBgzfBlock 3467820298285101088:2318280822927401004 16
destroy DecompressedBgzfBlock 3467820298285101088:2318281922439028780 16                                                               destroy DecompressedBgzfBlock 2318281922439028780:2318281922439360048 16
destroy DecompressedBgzfBlock 2318286380681145392:3184080310725782576 16
destroy DecompressedBgzfBlock 3467820298285101344:2318281922439028780 16
destroy DecompressedBgzfBlock 3539877892323029024:2318281922439094316 16
Program exited with code -11              
wrk@penguin2 ~/izip/git/opensource/D/sambamba/BioD [env]$

After making sure the start_offset is set to 0 it looks like these blocks have become invalid and the garbage collector still tries to clean them up.

pjotrp · 2019-06-10T17:34:55Z

Artem wrote:

Create a new task and put it on the roundbuffer using some magic

My conclusion is that that 'magic' no longer works. Creating a thread on the stack and moving it to the roundbuffer on the Heap confuses the garbage collector which is kinda unsurprising. With markdup I can't disable the GC cleanup so it needs some surgery.

ekg · 2019-06-19T18:24:00Z

We are having the same problem. Is the only solution at present to downgrade to 0.6.6?

pjotrp · 2019-06-19T18:45:28Z

It is one of these things that take a few days of work to fix. It is on my list :/

ekg · 2019-06-20T06:34:18Z

It might save some users a bit of time if you make a kind of warning binary release of 0.6.6-stable and drop it at the top of the release page.

I know how much of a pain it can be to track down stuff like this. We just had a big problem due to the Spectre/Meltdown patches changing the way that multithreaded interleaved system calls work. I wonder if this problem is related.

pjotrp · 2019-06-20T19:38:58Z

Actually the latest binary release 0.7.0 works.

https://github.com/biod/sambamba/releases/tag/v0.7.0

The problem is with later versions of the D compiler.

rikrdo89 · 2019-09-16T12:44:56Z

I got the same error "Segmentation fault (core dumped)" using sambamba 0.7.0 installed with conda and running in an LSF cluster.... :/

pjotrp · 2019-09-16T13:04:10Z

Conda builds with a recent ldc. That is the problem at this point.

pjotrp · 2019-09-23T15:04:20Z

A heads up.

At this point build sambamba with an older ldc - like the binary released on github.

We decided to replace the original bam reader with a new one I wrote almost 2 years ago and we have been testing. Rather than fix the GC related issue we are going to use the new bamreader which is simpler and therefore (hopefully) easier to maintain. @NickRoz1 who worked as a Google Summer of Code student for me on column based bams is doing that work.

One important difference is that the worker threads are no longer part of the reader itself. My theory is that performance should be similar ;)

vanottee · 2019-11-04T16:51:53Z

Same segmentation fault issue here - had installed on HPC system with bioconda. Is there a way to know when the Conda build has been updated? Alternatively, if I extract from the 7.0 source code, how do I actually execute a tool, like markdup? Thanks!

pjotrp · 2019-11-25T08:40:18Z

I am working on fixing this bug. Track progress here https://thebird.nl/blog/work/rotate.html

pjotrp · 2019-11-28T19:22:26Z

@sjackman we have a new release of sambamba that fixes a long standing bug with the D runtime and should now compile with all versions of ldc. See also https://travis-ci.org/biod/sambamba

sjackman · 2019-11-29T19:06:37Z

Thanks for the heads up, Pjotr. Once you've tagged a new release, would you like to open a PR to bump the version of Sambamba in Brewsci/bio? You need to change only lines, and you can do it from the GitHub web interface if you like. See https://github.com/brewsci/homebrew-bio/blob/c5b38cfea1eff4b18ae19d9dded00f990769de84/Formula/sambamba.rb#L6-L7

pjotrp · 2019-11-29T19:27:55Z

@sjackman feels dirty to edit through the web interface. But hopefully it works :)

sjackman · 2019-11-29T20:17:00Z

Hehe. Thanks!

pmagwene · 2021-07-28T04:18:13Z

I hate to revive a closed issue but I'm running into the segmentation fault issue with markdup in 0.8.0 as installed from conda.

My build:

sambamba 0.8.0
 by Artem Tarasov and Pjotr Prins (C) 2012-2020
    LDC 1.24.0 / DMD v2.094.1 / LLVM11.0.0 / bootstrap LDC - the LLVM D compiler (1.24.0)

System info:

Ubuntu 21.04
CPU: i9-9900K CPU @ 3.60GHz

pjotrp · 2021-07-28T18:10:10Z

Does this happen reproducibly? Can you share your BAM file in that case? Hard to fix stuff if we don't get reproducible errors.

pmagwene · 2021-07-29T16:00:11Z

An interesting twist on this. The file linked below reproducibly generates segmentation faults on my system when running single threaded (or whatever the default threads is; sambamba markdup -h doesn't indicate) but will complete OK if running with with -t <= 10 (i9-9900k is 8 core/16 threads).

https://drive.google.com/file/d/1owSZqrrWkrfbsEdf81iLrlByyOmfU4hs/view?usp=sharing

pjotrp · 2021-07-31T09:57:48Z

@pmagwene, thanks for the test file. This is a new issue and not related to earlier segfaults. I can not reproduce it on my AMD Ryzen 7 3700X 8-Core Processor and on a Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz. You may want try the static binary I'll release in a bit at https://github.com/biod/sambamba/releases. If that fails I suggest opening a new issue for the i9-9900k.

Emma1997WTT · 2022-08-03T08:35:27Z

ah maybe this is due to a high number of duplicates. I increased the --hash-table-size=4194304 and then it runs through without segfault

Hello, I solved my segfault with --hash-table-size=4194304!!! but I don't quite understand the meaning of this parameter, can you explain it to me? Thank you so much

NickRoz1 · 2022-08-04T08:20:15Z

Hi. This parameter sets the size of a hashmap. https://en.m.wikipedia.org/wiki/Hash_table https://github.com/biod/sambamba/blob/8a4102f9b2f95799dd0f432e6db15c57df75f537/sambamba/markdup.d#L318 If you have 10 keys value pairs but the size of hashmap is only 9, then some key will be mapped to the same cell as another key ( a duplicate ). The same will happen if you have 9 non distinct keys. In case of sambamba I think it moves duplicate values to a separate list, and then uses temp files to store these values if list is filled up. Maybe this functionality wasn't tested well or it didn't account for dataset of your size or with your number of duplicates ( https://github.com/biod/sambamba/blob/8a4102f9b2f95799dd0f432e6db15c57df75f537/sambamba/markdup.d#L467 ).

…

On Wed, Aug 3, 2022, 11:35 Emma1997WTT ***@***.***> wrote: ah maybe this is due to a high number of duplicates. I increased the --hash-table-size=4194304 and then it runs through without segfault Hello, I solved my segfault with --hash-table-size=4194304!!! but I don't quite understand the meaning of this parameter, can you explain it to me? Thank you so much — Reply to this email directly, view it on GitHub <#393 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKJFOIXDZK3WCPX3KQAVZYTVXIVNVANCNFSM4HCFFO7Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Emma1997WTT · 2022-08-05T02:04:41Z

Hi. This parameter sets the size of a hashmap. https://en.m.wikipedia.org/wiki/Hash_table

sambamba/sambamba/markdup.d

Line 318 in 8a4102f

_table = new HReadBlock[1 << table_size_log2];

If you have 10 keys value pairs but the size of hashmap is only 9, then some key will be mapped to the same cell as another key ( a duplicate ). The same will happen if you have 9 non distinct keys. In case of sambamba I think it moves duplicate values to a separate list, and then uses temp files to store these values if list is filled up. Maybe this functionality wasn't tested well or it didn't account for dataset of your size or with your number of duplicates (

sambamba/sambamba/markdup.d

Line 467 in 8a4102f

// FIXME: constant!

).
…
On Wed, Aug 3, 2022, 11:35 Emma1997WTT @.> wrote: ah maybe this is due to a high number of duplicates. I increased the --hash-table-size=4194304 and then it runs through without segfault Hello, I solved my segfault with --hash-table-size=4194304!!! but I don't quite understand the meaning of this parameter, can you explain it to me? Thank you so much — Reply to this email directly, view it on GitHub <#393 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKJFOIXDZK3WCPX3KQAVZYTVXIVNVANCNFSM4HCFFO7Q . You are receiving this because you were mentioned.Message ID: @.>

Thanks so much for your quick and professional reply,I get it now! Have a nice day!

pjotrp self-assigned this Apr 1, 2019

pjotrp added the bug label Apr 1, 2019

pjotrp added this to the 0.7.0 milestone Apr 1, 2019

pjotrp mentioned this issue Jun 19, 2019

testSliceMultipleRegionsBed fails #400

Closed

dpryan79 mentioned this issue Jun 24, 2019

ldc > 1.10 causes segfaults bioconda/bioconda-recipes#16063

Merged

5 tasks

pjotrp closed this as completed in biod/BioD@3f89719 Nov 28, 2019

markdup segmentation fault #393

markdup segmentation fault #393

Comments

ZhifeiLuo commented Mar 28, 2019

steffenheyne commented Mar 29, 2019

steffenheyne commented Mar 29, 2019

steffenheyne commented Mar 29, 2019

steffenheyne commented Mar 29, 2019

pjotrp commented Apr 1, 2019

steffenheyne commented Apr 2, 2019

pjotrp commented Apr 2, 2019

steffenheyne commented Apr 2, 2019

pjotrp commented Apr 2, 2019 • edited Loading

steffenheyne commented Apr 3, 2019

pjotrp commented Apr 3, 2019

steffenheyne commented Apr 3, 2019

pjotrp commented Apr 3, 2019 via email

steffenheyne commented Apr 3, 2019 • edited Loading

steffenheyne commented Apr 3, 2019 • edited Loading

RichardCorbett commented Apr 5, 2019

pjotrp commented Apr 5, 2019

RichardCorbett commented Apr 5, 2019

pjotrp commented Apr 5, 2019

pjotrp commented Apr 5, 2019

pjotrp commented Apr 5, 2019

RichardCorbett commented Apr 5, 2019

pjotrp commented Apr 5, 2019

dpryan79 commented Apr 9, 2019

pjotrp commented Apr 11, 2019

sirselim commented Apr 26, 2019

pjotrp commented Apr 26, 2019

pjotrp commented May 31, 2019

pjotrp commented May 31, 2019 • edited Loading

pjotrp commented May 31, 2019

pjotrp commented Jun 1, 2019

pjotrp commented Jun 2, 2019

pjotrp commented Jun 10, 2019 • edited Loading

pjotrp commented Jun 10, 2019

ekg commented Jun 19, 2019

pjotrp commented Jun 19, 2019

ekg commented Jun 20, 2019

pjotrp commented Jun 20, 2019

rikrdo89 commented Sep 16, 2019

pjotrp commented Sep 16, 2019

pjotrp commented Sep 23, 2019

vanottee commented Nov 4, 2019 • edited Loading

pjotrp commented Nov 25, 2019

pjotrp commented Nov 28, 2019

sjackman commented Nov 29, 2019

pjotrp commented Nov 29, 2019

sjackman commented Nov 29, 2019

pmagwene commented Jul 28, 2021

pjotrp commented Jul 28, 2021

pmagwene commented Jul 29, 2021 • edited Loading

pjotrp commented Jul 31, 2021

Emma1997WTT commented Aug 3, 2022

NickRoz1 commented Aug 4, 2022 via email

Emma1997WTT commented Aug 5, 2022

pjotrp commented Apr 2, 2019 •

edited

Loading

steffenheyne commented Apr 3, 2019 •

edited

Loading

steffenheyne commented Apr 3, 2019 •

edited

Loading

pjotrp commented May 31, 2019 •

edited

Loading

pjotrp commented Jun 10, 2019 •

edited

Loading

vanottee commented Nov 4, 2019 •

edited

Loading

pmagwene commented Jul 29, 2021 •

edited

Loading