Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Somalier underestimates relatedness if low sites overlap #55

Closed
fgvieira opened this issue Jun 10, 2020 · 3 comments
Closed

Somalier underestimates relatedness if low sites overlap #55

fgvieira opened this issue Jun 10, 2020 · 3 comments

Comments

@fgvieira
Copy link

Hi,

I am running some different samples (RNA vs cfDNA) through somalier and one of them came out with a very low relatedness, even though 77 (out of 78) sites are IBS2:

#sample_a   sample_b   relatedness  ibs0  ibs2  hom_conc hets_a  hets_b  shared_hets  hom_alts_a  hom_alts_b  shared_hom_alts  n    x_ibs0  x_ibs2  expected_relatedness
165962      778965     0.101        0     77    0.193    228     556     23           166         866         32               78   0       3       1.0

I think that, even though there are a lot of heterozygote sites, they do not actually overlap (the reason why n=78). On the RNA data probably because of non-expressed genes, and on the cfDNA prob due to the low coverage.

According to somalier's paper, relatedness is calculated as:

(shared-hets(i,j) - 2 * ibs0(i, j)) / min(hets(i), hets(j))

However, in this case, even though one sample has 228 hets and the other 556, they only overlap on 78 snps. So, shouldn't the formula be a bit more like:

(shared-hets(i,j) - 2 * ibs0(i, j)) / min(hets_in_common_pos(i), hets_in_common_pos(j))

where hets_in_common_pos(i) stands for the number of hets in sample "i" among the positions shared ("n"). This change should have no effect when comparing the same type of seq (since the overlap should be quite high) and improve comparisons of different types of seq.

@brentp
Copy link
Owner

brentp commented Jun 30, 2020

I see what you mean. I am looking into this.

@brentp
Copy link
Owner

brentp commented Jun 30, 2020

@fgvieira you are absolutely right. would you check this binary and verify that it works for you to fix this case?

somalier.gz

brentp added a commit that referenced this issue Jun 30, 2020
@fgvieira
Copy link
Author

fgvieira commented Jul 2, 2020

@brentp just checked the new version and it fixes the issue I was seeing. for this pair of samples relatedness went from 0.101 to 0.979 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants