-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AY.96 maybe belongs inside AY.46, or even AY.46.6 #435
Comments
I just uploaded all the AY.96 designation strain names to Usher, and (almost) all of the non-Botswana sequences are classified as AY.46.6 by Usher. Whereas all the Botswana ones are AY.96. In fact, AY.96 (proper) seems to be defined by the following 3 muts:
The confusion with AY.46 happens because that lineage is defined by:
And the following AY.96 mutation seems to have arisen (homoplasically?) in AY.46.6:
So there are a number of AY.46.6 (maybe 2k) that share 2 out of 3 mutations with AY.96. That'd be ok, if this hadn't been mixed up in designations. It's still not clear to me whether these are independent events or whether AY.96 is somehow related to AY.46.6, maybe through recombination? But until this is clear we should probably clean up AY.96 and remove the false designations. This should help unconfuse pangoLEARN, too. |
G28461A is a reversion, ugh. A28461G is common to Delta. Those 'homoplasic reversions' in Delta (and even more so in Omicron) make building a good tree a lot harder than it would be if the genome sequences could just be perfect. I wonder if perhaps we should mask 28461 in Delta in the UCSC/UShER tree. Looks like the reversion also affects AY.122.2. For the record, here's what we mask currently:
And also for reference: lineages are annotated on nodes of the UCSC/UShER tree using mutations in this file.
The daily build process includes a matOptimize run so when it's not super-clear in which order mutations arose, it's possible for branches to hop around a bit, and that has caused some problems for lineage designation like this. 🙁 A lineage designated using one version of the UCSC/UShER tree might be split in a later version. So yeah, cleaning up the designations sounds reasonable to me. @chrisruis any other recollections/observations? |
Looking at the recent UShER tree, the Botswana AY.96 sequences cluster separately from AY.46 - their clade is defined by A28254C (Orf8:I121L). C10977T (Orf1ab:A3571V) then occurs one branch into the clade, after 1 sequence has diverged. Orf8:I121L doesn't look to be that reliable - it changes backwards and forwards quite a bit within Delta, including reverting within AY.96, and actually changes backwards and forwards in other lineages too. So this potentially isn't an ideal marker for a lineage I expect if we masked Orf8:I121L, AY.96 would cluster with AY.46 due to the shared Orf1ab:A3571V. So whether we think it's a separate lineage or part of AY.46 potentially comes down to whether or not we trust Orf8:I121L and that doesn't look like a really reliable defining mutation I think we definitely want to update the designations for the non-Botswana AY.96 sequences. And then @AngieHinrichs should we try masking Orf8:I121L and confirm whether the Botswana sequences also cluster within AY.46? |
Yes, I will give this a try today. |
Masking 28254 (ORF8:121) or 28461 (N:63) alone didn't merge the AY.96 Botswana samples into AY.46, but masking both of those sites did. And it seemed to be not harmful, maybe helpful for other little branches in Delta with mutations (or probably false reversions) at those sites. So as of today's build, which hopefully will be ready by the end of tomorrow, I will mask both of those sites in the Delta branch. |
What's our conclusion now? Remove AY.96 entirely? |
Yep, with the new masking, AY.96 is gone from the 2022-02-17 tree (GISAID+public tree is on the main site; public-only tree will be updated in a few hours). The AY.96 designated sequences now have this breakdown of UShER lineage assignment:
|
AY.96 has been withdrawn in v1.2.129 |
When building the Nextclade reference tree, containing among others 2 randomly chosen sequences from the designation list for AY.96, I noticed that the two sequences didn't cluster together.
One sequence seems to sit inside AY.46.
Could it be that AY.96 is either not monophyletic or that it actually belongs inside AY.46? Or do we think AY.46 defining mutation
nuc: C10977T = ORF1ab: A3571V
homoplasically appeared within AY.96?How would one investigate? Look where all the AY.96 defining sequences get placed by Usher (and maybe also Nextclade?), in addition, it'd be handy to look at the mutations that differentiate AY.96 from base-Delta-21J. Something is up there, otherwise why would pangoLEARN misclassify so much as AY.46(.6).
Here's Usher with all sequences classified as AY.96 by pangoLEARN: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_902b_937d50.json?c=pango_lineage_usher&label=nuc%20mutations:A28254C
The text was updated successfully, but these errors were encountered: