Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault #4

Open
ydLiu-HIT opened this issue Aug 27, 2019 · 8 comments
Open

Segmentation fault #4

ydLiu-HIT opened this issue Aug 27, 2019 · 8 comments

Comments

@ydLiu-HIT
Copy link

Hi!

Recently, I was doing a benchmark for the performance of long spliced reads aligners(minimap2, GMAP, GraphMap2 and deSALT), but when I was running graphmap2 (v.0.6.01) on mouse pacbio SMRT reads and got a segmentation fault when 2.66% reads have been processed using 24 threads, shown in the screenshot:

image

The source of reads which I used can be found in https://www.ncbi.nlm.nih.gov/sra/?term=SRR6238555

How can I prevent segmentation faults?

thanks
Michael

@mjoppich
Copy link

I can report the very same for the yeast genome ... Reads used: SRR5989373 .

Have you been able to resolve the issue?

If I call it with --ambiguity 0.5 --secondary --min-bin-perc 0.01 --bin-step 0.99 --max-regions 20 --mapq -1 --spliced --chain-min-cov 40 (which according to the help is equivalent), no reads align ...

@jmaricb
Copy link
Member

jmaricb commented Oct 5, 2019

Hi @ydLiu-HIT ,

which reference were you using? Could you send the link or share it?

Thanks

@mjoppich
Copy link

mjoppich commented Oct 7, 2019

Hi @jmaricb

since I got a very similar problem, maybe you could use my case instead:

I used the reads from SRR5989373 together with the ensembl 94 release:

ftp://ftp.ensembl.org/pub/release-94/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.94.gtf.gz

ftp://ftp.ensembl.org/pub/release-94/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz

Thanks for looking into this!

@ydLiu-HIT
Copy link
Author

ydLiu-HIT commented Oct 11, 2019

Hi @ydLiu-HIT ,

which reference were you using? Could you send the link or share it?

Thanks

Ensembl genome with version 92 of GRCm38. link: ftp://ftp.ensembl.org/pub/release-92/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

@jmaricb
Copy link
Member

jmaricb commented Nov 21, 2019

@mjoppich @ydLiu-HIT I have located this segmentation fault. It was happening in the ksw2 aligner which would crash for very big references or very big queries. I have added this exception and tried it with this dataset that you linked: https://www.ncbi.nlm.nih.gov/sra/?term=SRR6238555
and it didn't crash.
Can you try with the newest commit and let me know if it still crashes.

@ydLiu-HIT
Copy link
Author

Hi jmaricb:

I just re-run GraphMap2(v0.6.3) with the same read and reference as before, but it still gets as segmentation fault as follow:

[11:54:03 BuildIndexes] Loading reference sequences.
[11:55:35 SetupIndex_] Building the index for shape: '11110111101111'.
[11:55:55 Create] Allocated memory for a list of 1362768835 seeds (128 bits each) (0.00003 sec, diff: 19.92923 sec).
[11:55:55 Create] Memory consumption: [currentRSS = 7801 MB, peakRSS = 7889 MB]
[11:55:55 Create] Collecting seeds.
[11:55:55 Create] Minimizer seeds will be used. Minimizer window is 5.
[12:03:44 Create] [currentRSS = 37193 MB, peakRSS = 49390 MB] Sequence: 44/44, len: 91744698, name: 'chrY'''
[12:03:50 Create] Final memory allocation after collecting seeds: [currentRSS = 37694 MB, peakRSS = 49390 MB]
[12:03:50 Create] Sorting the seeds using 24 threads.
[12:06:33 Create] Generating the hash table.
[12:07:01 Create] Calculating the distribution statistics for key counts.
[12:07:02 Create] Index statistics: average key count = 132.646856, max key count = 3457358.000000, std dev = 1632.888478, percentil (99.00%) (count cutoff) = 1181.000000
[12:07:31 Create] Memory consumption: [currentRSS = 38466 MB, peakRSS = 49390 MB]
[12:07:31 SetupIndex_] Finished building index.
[12:07:31 SetupIndex_] Storing the index to file: '/data/ydliu/Reference/mouse_GRCm38.fa.gmidx'.
[12:13:57 Index] Memory consumption: [currentRSS = 35868 MB, peakRSS = 49390 MB]
[12:13:57 Run] Hits will be thresholded at the percentil value (percentil: 99.000000%, frequency: 1181).
[12:13:57 Run] Minimizers will be used. Minimizer window length: 5
[12:13:57 Run] Reference genome is assumed to be linear.
[12:13:57 Run] One or more similarly good alignments will be output per mapped read. Will be marked secondary.
[12:13:57 ProcessReads] All reads will be loaded in memory.
[12:14:47 ProcessReads] All reads loaded in 49.92 sec (size around 3144 MB). (3213849871 bases)
[12:14:47 ProcessReads] Memory consumption: [currentRSS = 39749 MB, peakRSS = 49390 MB]

[1]+ Segmentation fault (core dumped) ~/software/graphmap2/bin/Linux-x64/graphmap2 align -x rnaseq -r /data/ydliu/Reference/mouse_GRCm38.fa -d /data2/ydliu/ONT_reads/SMRT/mouse/SRR6238555.fasta -o mouse_graphmap2.sam -t 24

@ydLiu-HIT
Copy link
Author

@jmaricb
Copy link
Member

jmaricb commented Dec 9, 2019

I am looking into this right now, again. As I can see from your comment the tool crashed right after loading reads into the memory, before aligning single read, right? Right not that doesn't happen for me. It aligns reads slowly, but it hasn't crash yet. I will try to see what happens and will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants