sga index segfault with large values of -d #131

sjackman · 2016-11-03T21:29:37Z

The command sga index -d 20000000 -t 64 hsapiens.preprocess.filter.pass.merged.fa segfaults with -d 20000000. Reducing to -d 1000000 works. Is each BWT batch size limited in size, perhaps to 2 or 4 billion nucleotides? -d 20000000 with a mean sequence size of ~300 bp should correspond to a batch size of about 6 Gbp.

The text was updated successfully, but these errors were encountered:

sjackman · 2016-11-03T21:30:54Z

Can sga index -a ropebwt work with the output of sga fm-merge? The mean sequence size is 300 bp, and the largest sequence is 30,889 bp.

jts · 2016-11-03T21:58:53Z

Did you run out of memory with -d 20000000? Without -a ropebwt a memory inefficient algorithm is used. There is no 2 (or 4) billion nucleotide batch limit.

jts · 2016-11-03T22:00:22Z

Whether it is worth using -a ropebwt depends on the read length distribution. I suggest sticking with the recommended parameters (not ropebwt, -d X). It shouldn't take very long.

sjackman · 2016-11-03T23:40:09Z

The fm-merge FASTA file is 20 GB, so it should be possible to construct the BWT in a single pass using SAIS in roughly 200 GB RAM. I reported this issue because of the segfault, which is 😢. I'm happy with the -d 1000000 workaround though.

Did you run out of memory with -d 20000000?

I don't believe so. It was using 76 GB of RAM when it crashed, and the machine has 2.5 TB available.

It shouldn't take very long.

I'm using sga index -d 1000000 now. It has finished 41 of 69 batches in four hours, so it's trucking along nicely. 🏎

sjackman · 2016-11-03T23:46:06Z

Have you read Optimal In-Place Suffix Sorting? https://arxiv.org/abs/1610.08305
It seems worth checking out. @rob-p brought it to my attention.

sjackman · 2016-11-07T18:15:38Z

sga index -d 1000000 completed in 25 hours.

sga index -d 1000000 -t 64 hsapiens.preprocess.filter.pass.merged.fa
205964.05s user 3080.39s system 232% cpu 24:56:18.90 total 9111 MB

jts · 2016-11-07T18:32:42Z

Thanks for the update. I did see that paper from @rob-p's twitter - its on my to-read list :)

sjackman · 2016-11-07T19:02:33Z

Here's the wallclock and memory results for SGA on human HG004 data with and without fm-mege. (a memo to self and for future curious readers)

fm-merge	Wallclock (h)	Peak Memory (GB)
FALSE	65.4	270.35938
TRUE	65.0	82.24316

jts · 2016-11-08T00:24:43Z

Interesting, thanks! I wouldn't have expected the runtimes to be (nearly) the same, but it is good to see.

sjackman · 2016-11-08T01:38:01Z

It was surprising to me to. Running fm-merge first speeds up overlap and assemble quite a bit. I found that rmdup after fm-merge didn't remove any sequences. Is it necessary, or did I just get lucky?

sjackman · 2016-11-09T19:59:19Z

sga index -d 1000000 succeeded.
sga index -d 10000000 succeeded.
sga index -d 20000000 segfaulted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sga index segfault with large values of -d #131

sga index segfault with large values of -d #131

sjackman commented Nov 3, 2016

sjackman commented Nov 3, 2016

jts commented Nov 3, 2016

jts commented Nov 3, 2016

sjackman commented Nov 3, 2016 •

edited

Loading

sjackman commented Nov 3, 2016

sjackman commented Nov 7, 2016

jts commented Nov 7, 2016

sjackman commented Nov 7, 2016 •

edited

Loading

jts commented Nov 8, 2016

sjackman commented Nov 8, 2016

sjackman commented Nov 9, 2016 •

edited

Loading

sga index segfault with large values of -d #131

sga index segfault with large values of -d #131

Comments

sjackman commented Nov 3, 2016

sjackman commented Nov 3, 2016

jts commented Nov 3, 2016

jts commented Nov 3, 2016

sjackman commented Nov 3, 2016 • edited Loading

sjackman commented Nov 3, 2016

sjackman commented Nov 7, 2016

jts commented Nov 7, 2016

sjackman commented Nov 7, 2016 • edited Loading

jts commented Nov 8, 2016

sjackman commented Nov 8, 2016

sjackman commented Nov 9, 2016 • edited Loading

sjackman commented Nov 3, 2016 •

edited

Loading

sjackman commented Nov 7, 2016 •

edited

Loading

sjackman commented Nov 9, 2016 •

edited

Loading