-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix when tree zooms back too far #1552
Conversation
For trees viewed with the divergence metric and with mutations defined, it is sensible (and easy) for the zoom back button ("-" button) to walk up the tree until a branch with mutations is found. This will behave nicely with the common scenario of polytomies encoded by a bifurcating tree (which often happens during coalescent reconstruction in augur refine). The original heuristic remains for those trees where mutations are not defined on the branches. There are cleverer ways to infer the correct threshold here if we want to, but I don't think it's worth it at the moment. Closes #1529
afd2db7
to
2b3086b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, thanks for fixing it up!
I just left a question for my own understanding 🙏
while (isWithinBranchTolerance(node, potentialNode, distanceMeasure)) { | ||
if (potentialNode === potentialNode.parent) break; // root node of tree | ||
potentialNode = potentialNode.parent; | ||
} | ||
return potentialNode; | ||
}; | ||
|
||
function areNucleotideMutationsPresent(observedMutations) { | ||
const mutList = Object.keys(observedMutations); | ||
for (let idx=mutList.length-1; idx>=0; idx--) { // start at end, as nucs come last in the key-insertion order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point to where the key-insertion order can be found? (I'm still trying to piece everything together 😅 )
From what I can find, it looks like the order is dependent on keys in n.branch_attrs.mutations
auspice/src/util/treeJsonProcessing.js
Lines 145 to 156 in 07b66e8
const collectObservedMutations = (nodesArray) => { | |
const mutations = {}; | |
nodesArray.forEach((n) => { | |
if (!n.branch_attrs || !n.branch_attrs.mutations) return; | |
Object.entries(n.branch_attrs.mutations).forEach(([gene, muts]) => { | |
muts.forEach((mut) => { | |
mutations[`${gene}:${mut}`] ? mutations[`${gene}:${mut}`]++ : (mutations[`${gene}:${mut}`] = 1); | |
}); | |
}); | |
}); | |
return mutations; | |
}; |
Which I think is based on the order of insertion within augur export
?
(boo, cross-repo code snippets don't work)
def _transfer_mutations(node, raw_data):
if "aa_muts" in raw_data or "muts" in raw_data:
node["branch_attrs"]["mutations"] = {}
if "muts" in raw_data and len(raw_data["muts"]):
node["branch_attrs"]["mutations"]["nuc"] = raw_data["muts"]
if "aa_muts" in raw_data:
aa = {gene:data for gene, data in raw_data["aa_muts"].items() if len(data)}
node["branch_attrs"]["mutations"].update(aa)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually know - but I would presume you're right here. This means it's related to nextstrain/augur#1052, which if we move to OrderedDict will make "nuc" come first for GenBank files (created by load_features
, since "source" comes before CDSs in genbank files) and last for VCF files (as the translate code adds them explicitly). (That's from reading the code, I haven't double checked everything.)
In testing for this PR the "nuc" block consistently came last. In terms of code performance it's not overly important, but it seemed sensible to iterate this way. I'll make a note on the augur issue to revisit this auspice code in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A further note: it would be more efficient to calculate and store a "areNucleotideMutationsPresent" flag in a redux store rather than re-calculating it each time one presses the "goBack" button. I decided that the slight performance hit (not noticeable in my testing) was preferable to the overhead of creating this state & keeping it in sync as trees change etc.
How can I test this on a dataset where I've encountered this bug? None of them are included as test datasets. Can I drag and drop? Would be nice if test builds were like auspice.us :) Or is it easy to push so that any test dataset appears there? What's the bucket? |
This prompted #1558 - apologies that testing wasn't easier here. I'm going to merge & release now as I've just re-tested things. |
Current behavior (from #1529 (comment)):
We want to zoom back to the parent branch, unless that branch is part of some polytomy-like structure, in which case we want to keep walking up the tree until we're beyond the polytomy. Determining whether a branch length is so small it's in a polytomy is the root of the problem here. We use a heuristic which takes the length of the branch from the root node to the rightmost node in the current view (in the example here when zoomed into B.1.1 this is 563, but the actual numerical value can differ by many orders of magnitude depending on the dataset and what zoom you're using); if a branch is less than 1/50th of this then we keep walking up the tree. In the example here, the branch for B.1.1 has length 1 and 1<563/50 so we try the next branch, B.1, which is 12 and 12>563/50 so we stop there.
New behavior:
For trees viewed with the divergence metric and with mutations defined, it is sensible (and easy) for the zoom back button ("-" button) to walk up the tree until a branch with mutations is found. This will behave nicely with the common scenario of polytomies encoded by a bifurcating tree (which often happens during coalescent reconstruction in augur refine).
The original heuristic remains for those trees where mutations are not defined on the branches. There are cleverer ways to infer the correct threshold here if we want to, but I don't think it's worth it at the moment.
Closes #1529