Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pangolin conflict #508

Open
carlottaolivero opened this issue Feb 15, 2023 · 1 comment
Open

Pangolin conflict #508

carlottaolivero opened this issue Feb 15, 2023 · 1 comment
Labels
question Further information is requested

Comments

@carlottaolivero
Copy link

Hi,
We have some questions regarding the result of Pango Lineage from https://pangolin.cog-uk.io/.

  1. For some samples, Pangolin web generates a conflict:
    Lineage - Note
    BA.2 - Usher placements: BA.2(1/2) BA.2.10.1(1/2)
    BQ.1.19 -  Usher placements: BQ.1.1(1/2) BQ.1.19(1/2)

In this case the conflic is 1/2, what's the reason behind calling BA.2 instead of BA.2.10.1 for the first sample and calling BQ.1.19 instead of BQ.1.1 for the second one?

  1. It seems that for some samples the result of Pango website doesn't coincide with the result of Pango lineage given by GISAID (https://gisaid.org/) even if the data version is the same. In this case, I am referring to "Pango v.4.2 consensus call".
    The following table summarizes the results we are referring to.

image

What could be the reason of this difference?

Many thanks for the help and for your amazing work!
Carlotta Olivero

@AngieHinrichs
Copy link
Member

Hi! For your first question: usher searches for the most parsimonious placement of your sequence in a tree that represents a random sample of the diversity within each Pango lineage as annotated on UCSC's UShER tree. For some sequences, especially those with a lot of N bases (low coverage / no-call), there are multiple branches on the tree that match your sequence equally well. Unfortunately there isn't a good way to get the details of which mutations make the sequence have equally parsimonious placements somewhere in BA.2 and somewhere in BA.2.10.1.

One thing that you can try in these cases is the UShER web interface (https://usher.bio) which places your sequence in the full UShER tree of almost 15 million sequences, instead of the much smaller downsampled tree used by pangolin. There is a higher chance of finding sequences that are more similar to your sequence in the full tree, and that may help to resolve where the sequence really belongs. But again if the sequence has many Ns, or many locations where the reference sequence has been used to fill in missing sequence, or is a mixture of genomes from different lineages (e.g. from a co-infection or recombinant), then it may have multiple equally not-great matches in the full tree too.

For your second question: I'm not sure exactly what GISAID's "consensus call" means, but I think they might look at results from both UShER and pangoLEARN mode, as well as Scorpio (which can override pangoLEARN's result but not usher's in pangolin output), and use some kind of heuristic to resolve differences between them. Best to ask GISAID how they compute the consensus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants