Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault during geno step #5

Open
ldenti opened this issue Mar 14, 2019 · 1 comment
Open

Segmentation fault during geno step #5

ldenti opened this issue Mar 14, 2019 · 1 comment

Comments

@ldenti
Copy link

ldenti commented Mar 14, 2019

Hi,
when I try to run vargeno on the same data linked in my previous issue (#2), it crashes during the geno step.

This is the output of vargeno index:

[BloomFilter constructBfFromGenomeseq] bit vector: 1130814221/9600000000
[BloomFilter constructBfFromGenomeseq] lite bit vector: 2131757218/18400000000
[BloomFilter constructBfFromVCF] bit vector: 68265608/1120000000
SNP Dictionary
Total k-mers:        2593345952
Unambig k-mers:      2367171409
Ambig unique k-mers: 37905369
Ambig total k-mers:  226174543
Ref Dictionary
Total k-mers:        2858648351
Unambig k-mers:      2488558606
Ambig unique k-mers: 61723937
Ambig total k-mers:  370089745

and these are the files produced during the index step:

4.0K    vargeno.RMNISTHS_30xdownsample.index.chrlens
1.2G    vargeno.RMNISTHS_30xdownsample.index.ref.bf
2.2G    vargeno.RMNISTHS_30xdownsample.index.ref.bf.lite.bf
34G     vargeno.RMNISTHS_30xdownsample.index.ref.dict
134M    vargeno.RMNISTHS_30xdownsample.index.snp.bf
39G     vargeno.RMNISTHS_30xdownsample.index.snp.dict

When running the geno step, vargeno prints "Processing..." and crashes shortly thereafter:

Initializing...
Processing...
Segmentation fault (core dumped)

\time reports that it is terminated by signal 11 but I'm not sure where this happens. At first I thought that it was due to RAM saturation (the machine used to test the tool is equipped with 256GB of RAM) but the same behaviour occurs on a cluster with 1TB of RAM.

Anyway, I also tried to run vargeno on a smaller set of variants (I halved the input VCF) and it is able to conclude the analysis.

The complete VCF contains 84739838 variants and the sample consists of 696168435 reads. The whole (unzipped) data accounts for ~240GB of disk space. If you want to reproduce this behaviour on your machine, I can share the data with you.

Luca

@bbsunchen
Copy link
Member

Hi Luca, I am working in it, will let you when I fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants