You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If A is the allele in REF and B,C,... are the alleles as ordered in ALT, the ordering of genotypes for the likelihoods is given by: F(j/k) = (k*(k+1)/2)+j.
It is unclear to me how one is supposed to interpret this equation as defining an ordering of anything. What do j and k represent? Why are we dividing one by the other? What does the value returned by the F function represent?
I think what is happening is that j represents the number of the first allele of a diploid pair (with 0 for the ref allele) k represents the number of the second, the "/" is really denoting that the argument of the function is the unphased genotype composed from those two alleles, and the result of the function is the index in the GL array at which the likelihood of that genotype is to be found. If that is the case, this should be described more clearly in the spec. If that is not the case, this should definitely be described more clearly in the spec.
Furthermore, for triploid or higher sites, the spec merely says genotype likelihoods should appear in "the canonical order". What order is that, exactly?
The text was updated successfully, but these errors were encountered:
You are probably right, it should be described more clearly. The idea is actually quite simple and the examples which follow immediately after the sentence should clear any doubts for most readers: "In other words, for biallelic sites the ordering is: AA,AB,BB; for triallelic sites the ordering is: AA,AB,BB,AC,BC,CC, etc."
The generalization of the likelihoods order can be described by the
following nested loops. Here P is the ploidy and N the number of
alleles:
for a1=0..N
for a2=0..a1
...
for aP=0..a(P-1)
print a1/a2/../aP
The description of the GL field says that:
It is unclear to me how one is supposed to interpret this equation as defining an ordering of anything. What do
j
andk
represent? Why are we dividing one by the other? What does the value returned by theF
function represent?I think what is happening is that
j
represents the number of the first allele of a diploid pair (with 0 for the ref allele)k
represents the number of the second, the "/" is really denoting that the argument of the function is the unphased genotype composed from those two alleles, and the result of the function is the index in the GL array at which the likelihood of that genotype is to be found. If that is the case, this should be described more clearly in the spec. If that is not the case, this should definitely be described more clearly in the spec.Furthermore, for triploid or higher sites, the spec merely says genotype likelihoods should appear in "the canonical order". What order is that, exactly?
The text was updated successfully, but these errors were encountered: