Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding structural varianCommand terminated by signal 11 #11

Open
morispi opened this issue Nov 6, 2020 · 29 comments
Open

Finding structural varianCommand terminated by signal 11 #11

morispi opened this issue Nov 6, 2020 · 29 comments
Assignees

Comments

@morispi
Copy link

morispi commented Nov 6, 2020

Hello,

I am attempting to run Valor on a non-human dataset for which no default Sonic file is available.

I thus followed the guidelines on Sonic's GitHub page and created my own file. I generated two Sonic files, one with empty bed / out files passed to the --dups, --reps, and --gaps parameters, and another with files containing actual data passed to the parameters. First thing that seemed strange to me is that the Sonic file generated from empty bed/out files has a greater size that the other one. Is that an expected behaviour?

I generated the segmental duplication file using SEDEF, as mentioned here calkan/sonic#11 , but since my assembly is not soft-masked, SEDEF generated an empty file. RepeatMasker ran correctly, and I ran a script of my own to find the gaps. Both gaps and repetitions files were thus correct and contained data.

Anyway, I then attempted to run Valor with these two Sonic files, to see if there would be any difference. However, Valor fails with both files, reporting that "Finding structural varianCommand terminated by signal 11". Both iterations of Valor fail on the same chromosome.

Do you know what might be causing this issue? Is there anything I'm doing wrong with Sonic files? I'm at a loss here.

Thanks in advance.

Best,
Pierre

@calkan
Copy link
Member

calkan commented Nov 6, 2020

Hi

I will leave the VALOR segmentation fault issue to @f0t1h , but I can take a look at the SONIC related part of your bug report. Larger SONIC file with empty output is definitely not expected. Is it possible for you to share your files so I can try to pinpoint the problem?

@morispi
Copy link
Author

morispi commented Nov 6, 2020

Hi,

Thanks for the super quick reply!
Here are the files I used to generate the SONIC file:
DataSonic.tar.gz

@calkan
Copy link
Member

calkan commented Nov 6, 2020

the reference genome is missing :-) can you send the FASTA as well?

@f0t1h
Copy link
Contributor

f0t1h commented Nov 6, 2020

reporting that "Finding structural varianCommand terminated by signal 11".

My guess is, sonic->chromosome_names, array is null. I will double check tomorrow.

@morispi
Copy link
Author

morispi commented Nov 6, 2020

Hi again,

Sorry for the delay, I was checking wether or not I could share the reference genome.
Unfortunately it is private and I cannot share it.
I'm testing Valor / Sonic on a subset of the reference genome and will see if I can upload this subset.

Best,
Pierre

@calkan
Copy link
Member

calkan commented Nov 6, 2020

well, I will need something to test. But a quick look at the gaps file shows a problem, it is not a valid BED file. you need something like:

chromosome1 45400 55400
...

separated by tabs. those "edges=5300..117305 left=156508 right=143120 ver=1.10 style=3" parts have to go

@morispi
Copy link
Author

morispi commented Nov 6, 2020

I asked and I have the right to share a subset of my data. Here is an archive containing everything:
ExampleDataset.tar.gz

The invalid BED file you mentioned came from the way my reference genome's chromosomes were named. I updated their name and tried to run Sonic / Valor again, but nothing changed, and I ended up with an empty Sonic file, and Valor stopping upon start.

@calkan
Copy link
Member

calkan commented Nov 6, 2020

well, at least for this set it looks fine. an you also try with the same subset?

with real files

sonic --ref reference.fasta --gaps Gaps/gaps.bed --reps Repeats/repeats.out --dups SegDups/segdups.bed --make-sonic ref.sonic
Number of chromosomes: 1
Adding gap intervals to SONIC.
Read 32 BED entries.
Writing entries for chromosome 0
Wrote 32 entries.
Adding segmental duplication intervals to SONIC.
Read 0 BED entries.
Writing entries for chromosome 0
Wrote 0 entries.
Adding 2297 repeats to SONIC.
Read 2294 BED entries.
Writing entries for chromosome 0
Wrote 2294 entries.
Adding GC profile SONIC.
SONIC file ref.sonic is ready.
Memory usage: 0.74 MB.

calkan@donut:~/tmp/sonic/DataSonic/ExampleDataset$ ls -l ref.sonic
-rw-rw-r-- 1 calkan calkan 54635 Nov 6 16:29 ref.sonic

with null files

calkan@donut:/tmp/sonic/DataSonic/ExampleDataset$ sonic --ref reference.fasta --gaps null.bed --reps null.bed --dups null.bed --make-sonic ref-null.sonic
Number of chromosomes: 1
Adding gap intervals to SONIC.
Read 0 BED entries.
Writing entries for chromosome 0
Wrote 0 entries.
Adding segmental duplication intervals to SONIC.
Read 0 BED entries.
Writing entries for chromosome 0
Wrote 0 entries.
Adding 0 repeats to SONIC.
Read 0 BED entries.
Writing entries for chromosome 0
Wrote 0 entries.
Adding GC profile SONIC.
SONIC file ref-null.sonic is ready.
Memory usage: 0.00 MB.
calkan@donut:
/tmp/sonic/DataSonic/ExampleDataset$ ls -l ref-null.sonic
-rw-rw-r-- 1 calkan calkan 26399 Nov 6 16:30 ref-null.sonic

@morispi
Copy link
Author

morispi commented Nov 6, 2020

I tried with this subset, yes.

Here is what I get:

./sonic --ref reference.fasta --gaps Gaps/gaps.bed --reps Repeats/repeats.out --dups SegDups/segdups.bed --make-sonic ref.sonic
Number of chromosomes: 1
Adding gap intervals to SONIC.
Wrote 32 entries.
Adding segmental duplication intervals to SONIC.
Wrote 0 entries.
Adding repeats to SONIC.

ll ref.sonic
-rw-rw-r-- 1 morispi morispi 0 nov. 6 14:38 ref.sonic

I tried on two different computers and got the same output.

@morispi
Copy link
Author

morispi commented Nov 6, 2020

Apparently, this could be caused by the version of Sonic, which is not the latest by default when cloning Valor. I manually pulled the latest version and managed to properly create the Sonic file. I will attempt to run Valor again, but need to update my BAM file, since my reference genome's headers contained spaces when the BAM file was generated, which seems to be incompatible with Valor.

Thanks a lot for your answers!

@calkan
Copy link
Member

calkan commented Nov 6, 2020

oh, I didn't notice it was a versioning issue. I will update the version in this repo then.

@calkan
Copy link
Member

calkan commented Nov 6, 2020

I updated the SONIC submodule in this repo. Let us know if you still have the segmentation fault problem. I won't be able to help with VALOR code itself, it will be in @f0t1h 's court if problem persists.

@morispi
Copy link
Author

morispi commented Nov 9, 2020

Hi again,

I modified my reference genome so it does not contain spaces, it thus only has simple headers such as ">0", ">1", etc.
I re-ran LongRanger to generate a new BAM file of alignment betweens my reads and this new reference genome.
I re-generated the Sonic file with the aforementioned new reference genome, and all went well.

I then tried to re-run Valor with these new files, and using 1 threads, it still reports that it ended by signal 11. Here are the last few lines of the log:

Finding structural variants in chromosome 27
Recovering split molecules..
Sorting the DNA Intervals
Recovering molecules
Global molecule depth mean: 132.279412
Global molecule depth standard deviation: 41.180752
Molecule Count: 6	Molecule mean: 18266.833333	Molecule std-dev: 10096.702409
Matching split molecules
2 candidate variations are made
0 candidate variations are left after filtering
Finding SV Clusters


Finding structural variants inCommand terminated by signal 11

	Command being timed: "/home/genouest/genscale/pmorisse/Valor/valor -i /scratch/supergene/mapping_onehap38/longranger_align/25_ShortHeaders/out/25/outs/possorted_bam.bam -o ResultsPapillon25Ref/res -s /home/genouest/genscale/pmorisse/Data/38_ShortHeaders.sonic -f INV,DUP,IDUP,TRA,ITRA,DEL -t 1"

This is not a memory issue, since I reserved 50GB, and Valor barely used 300MB. Is there anything else I am doing wrong?

Thanks in advance.
Pierre

@calkan
Copy link
Member

calkan commented Nov 9, 2020

leaving this to @f0t1h

@f0t1h
Copy link
Contributor

f0t1h commented Nov 9, 2020

Can you send me the sonic file, so I can test what is wrong?

@f0t1h
Copy link
Contributor

f0t1h commented Nov 9, 2020

Also, can you send all logs from the beginning? @morispi

@morispi
Copy link
Author

morispi commented Nov 10, 2020

Here is an archive containing the Sonic file and the logs from the run:
Issue11Data.tar.gz

I ran again on a single chromosome in order to get a smaller sonic / log.
Tell me if you need anything else.

@f0t1h
Copy link
Contributor

f0t1h commented Nov 10, 2020

Does this error happen always on first chromosome in the sample, or randomly?

@morispi
Copy link
Author

morispi commented Nov 10, 2020

Well the data I shared only contains one chromosome, so it always happens on this one.

@morispi
Copy link
Author

morispi commented Nov 10, 2020

Apart from that, on a regular file with multiple chromosomes, it seems to happen randomly.

@f0t1h
Copy link
Contributor

f0t1h commented Nov 10, 2020

#include <stdio.h>
#include <stdlib.h>
#include "sonic/sonic.h"

int main(int argc, char **argv){

    sonic *snc = sonic_load(argv[1]);
    printf("%s\n",snc->chromosome_names[0]);
    printf("%lf\n",sonic_is_segmental_duplication(snc,snc->chromosome_names[0],1000,50000));
    printf("%d\n",sonic_is_gap(snc,snc->chromosome_names[0],1000,50000));
    printf("%lf\n",sonic_is_satellite(snc,snc->chromosome_names[0],1000,50000));
    free_sonic(snc);
    return 0;
}

gcc test.c sonic/libsonic.a -lz -g -o sonic_test

and

valgrind -v ./sonic_test 38_SuperGene_ShortHeaders.sonic

I run above code loading the sonic file you provided and got no errors. Can you try the same? Sonic file looks healthy. Is it possible that there is a corruption in the bam file?

@morispi
Copy link
Author

morispi commented Nov 10, 2020

Here is the output from Valgrind: Valgrind.log

I don't think the BAM file is corrupted, since it was generated recently, with LongRanger 2.2.2, and since LongRanger did not report any error while running. Is there any way I can check if the BAM file is corrupted though?

@f0t1h
Copy link
Contributor

f0t1h commented Nov 10, 2020

You can use samtools quickcheck . If it doesn't fail there you can use samtools view to write it to another bam file (if it is not too large).

I will be offline now since it is past 2AM here. I will follow up tomorrow.

@morispi
Copy link
Author

morispi commented Nov 10, 2020

Samtools quickcheck did not report any error. I tried to use samtools view to write my bam file to another bam file, and attempted to run Valor on that new bam file again. It still ended with the same error.

No problem with following up tomorrow, thanks for your answers!

@f0t1h
Copy link
Contributor

f0t1h commented Nov 11, 2020

Can you set --contig_count or -c to number of chromosomes in your reference? By default, it assumes Human sequencing and runs first 24 (22+ X +Y) chromosomes. It may be the cause of your problem. If this fixes it, I will update the software accordingly.

@morispi
Copy link
Author

morispi commented Nov 12, 2020

Sorry for not answering yesterday, it was a holiday in France.
Running valor with the option --contig_count seems to resolve the issue!
I just realized this option is actually mentioned in the --help of Valor, but I stuck to the README only to help me launch Valor. My bad. However, maybe you should add all options to the README? I think that may help other people as well.

Anyway, thank you very much for your answers!

@morispi
Copy link
Author

morispi commented Nov 12, 2020

Edit: Well, it seemed to work at first... It still terminated with signal 11, but much later that previous runs.
I tried setting the --contig_count to a higher value but it did not seem to change anything.
Moreover, Valor seems to be running extremely fast compared to other experiments on human data I performed.
He is the complete log if you want to take a look: Valor.log

@f0t1h
Copy link
Contributor

f0t1h commented Nov 12, 2020

I am in process of cleaning the command line interface, once done I will update the README as well.

Can you let me know minimum number of bases in these contigs? Valor is designed for large variants, problem might be because of the smaller contigs. I will update the code tomorrow to check this.

@morispi
Copy link
Author

morispi commented Nov 19, 2020

Sorry, it seems like I forgot to reply.

The main contig of interest is about 3.9Mbp. I managed to run Valor successfully on this contig only, by subsampling my BAM and FASTA files. However Valor did not output any variant. Is it because the contig is too small?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants