Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[error] SIGSEGV: Illegal storage access. (Attempt to read from nil?) Segmentation fault (core dumped) ( #131

Open
egenomics opened this issue Dec 18, 2023 · 14 comments

Comments

@egenomics
Copy link

Hi,
I am getting an error in the last step of ancestry calling, after succesfully generating all the .somalier query files and downloading the relevant ancestry-labels-1kg.tsv file and 1kg.somalier/.somalier files.

Here is the code that we have tested in two different machines with the same error

(base) jlvillanueva@EEP10709:~/Downloads/somalier_aina$ ll
total 117916
drwxrwxr-x  4 jlvillanueva jlvillanueva     4096 Dec 18 15:45 ./
drwxr-xr-x 54 jlvillanueva jlvillanueva    40960 Dec 18 16:06 ../
drwxrwxr-x  3 jlvillanueva jlvillanueva     4096 Dec 18 15:44 1kg.somalier/
-rw-rw-r--  1 jlvillanueva jlvillanueva 82856769 Dec 18 15:44 1kg.somalier.tar.gz
-rw-rw-r--  1 jlvillanueva jlvillanueva    56028 Dec 18 15:44 ancestry-labels-1kg.tsv
drwxrwxr-x  2 jlvillanueva jlvillanueva     4096 Dec 18 15:09 cohort/
-rw-rw-r--  1 jlvillanueva jlvillanueva   265818 Dec 18 15:44 sites.hg38.vcf.gz
-rwxrwxr-x  1 jlvillanueva jlvillanueva 37500280 Dec 18 15:44 somalier*
(base) jlvillanueva@EEP10709:~/Downloads/somalier_aina$ ./somalier ancestry --labels ancestry-labels-1kg.tsv 1kg.somalier/*.somalier ++ cohort/*.somalier
somalier version: 0.2.18
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Segmentation fault (core dumped)
@brentp
Copy link
Owner

brentp commented Dec 18, 2023

Hi, can you run the same command with the binary attached here (after gunzip somalier_dbg.gz && chmod +x somalier_dbg) and show the output?
somalier_dbg.gz

@egenomics
Copy link
Author

Hi,
Thanks for the quick response! I get the following error:

(base) jlvillanueva@EEP10709:~/Downloads/somalier_aina$ ./somalier_dbg ancestry --labels ancestry-labels-1kg.tsv 1kg.somalier/*.somalier ++ cohort/*.somalier
somalier version: 0.2.19
/home/brentp/src/somalier/src/somalier.nim(276) somalier
/home/brentp/src/somalier/src/somalier.nim(263) main
/home/brentp/src/somalier/src/somalierpkg/ancestry.nim(137) ancestry_main
/nim-1.6.6/lib/system/fatal.nim(53) sysFatal
Error: unhandled exception: index out of bounds, the container is empty [IndexDefect]

@brentp
Copy link
Owner

brentp commented Dec 18, 2023

It seems that the training matrix (1kg) is empty so either the sites don't match or you don't have samples in that directory. What does:

ls -lh 1kg.somalier/*.somalier | head

show?

@egenomics
Copy link
Author

I feel a bit dumb... There is another folder inside 1kg.somalier. I have fixed the command. However it still gives an error:

./somalier_dbg ancestry --labels ancestry-labels-1kg.tsv 1kg.somalier/1kg-somalier/*.somalier ++ cohort/*.somalier
somalier version: 0.2.19
Segmentation fault (core dumped)

@brentp
Copy link
Owner

brentp commented Dec 18, 2023

Hmm. that's a problem that we're not getting any information beynd the segfault now.

@egenomics
Copy link
Author

We have tested it in two different computers with the same error :(

@brentp
Copy link
Owner

brentp commented Dec 19, 2023

Yes, I expect that it will be the same on any machine. How many samples are you looking at?
I attach here another binary with hopefully more debug info turned on. Maybe it will give us more clues.
somalier_dbg2.gz

The ancestry stuff is, as you're finding, less used and more prone to problems than the rest of somalier. You might also try python scripts/ancestry-predict.py which uses PCA -> SVM instead of a neural network. You can run that with -h to see the arguments.

@egenomics
Copy link
Author

I am looking at 24 samples:

ls cohort/*.somalier | wc -l
24

I have tried the debug binary version2 but I get no more information than with the previous one:

./somalier_dbg2 ancestry --labels ancestry-labels-1kg.tsv 1kg.somalier/1kg-somalier/*.somalier ++ cohort/*.somalier
somalier version: 0.2.19
Segmentation fault (core dumped)

About the python script I get a strange error:

python code/somalier/scripts/ancestry-predict.py --labels ancestry-labels-1kg.tsv --backgrounds 1kg.somalier/1kg-somalier/*.somalier --samples cohort/*.somalier --plot test_plot
Traceback (most recent call last):
  File "/home/jlvillanueva/Downloads/somalier_aina/code/somalier/scripts/ancestry-predict.py", line 171, in <module>
    df_pca = df_pca.append(
  File "/home/jlvillanueva/miniconda3/lib/python3.9/site-packages/pandas/core/generic.py", line 5989, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'append'

Thanks again for your assistance Brent!

@egenomics
Copy link
Author

The plot is generated though:
test_plot

@brentp
Copy link
Owner

brentp commented Dec 19, 2023

Looks like append is gone from pandas. You can change line 171,172 from:

            df_pca = df_pca.append(
                other=(pd.DataFrame(test_reduced, test_samples, labels_pc)))

to:

            df_pca = pd.concat([df_pca, pd.DataFrame(test_reduced, test_samples, labels_pc)])

I think that should work, but haven't tested it.

@brentp
Copy link
Owner

brentp commented Dec 19, 2023

You can also change other things in the script. For example, line 92 you can change n_components to 3.
You can also see the other parameters to change for the SVM: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

If you make all of these changes and get something that looks good, I'd be happy to get a PR that incorporates the changes.

@egenomics
Copy link
Author

egenomics commented Dec 19, 2023

With these changes it looks like it works. I have tried modifying the components to 3 and for the test run and visually speaking it looks better at assigning populations.
I will run it in many more samples to see what we get.

Do you know if there is a background dataset with more population granularity? It will be quite interesting to know the population of origin for certain patients and continental is a hint but still very general. We usually have exomes and panels of genes, so most intergenic SNPs are not captured.

@brentp
Copy link
Owner

brentp commented Dec 19, 2023

Thousand genomes has finer subpopulations, but then you have so few training samples that it's not as reliable. There may be other resources for this, but I haven't kept up with them.

@brentp
Copy link
Owner

brentp commented Dec 19, 2023

glad to hear it's working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants