Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix when tree zooms back too far #1552

Merged
merged 1 commit into from
Oct 5, 2022
Merged

Bugfix when tree zooms back too far #1552

merged 1 commit into from
Oct 5, 2022

Conversation

jameshadfield
Copy link
Member

Current behavior (from #1529 (comment)):

We want to zoom back to the parent branch, unless that branch is part of some polytomy-like structure, in which case we want to keep walking up the tree until we're beyond the polytomy. Determining whether a branch length is so small it's in a polytomy is the root of the problem here. We use a heuristic which takes the length of the branch from the root node to the rightmost node in the current view (in the example here when zoomed into B.1.1 this is 563, but the actual numerical value can differ by many orders of magnitude depending on the dataset and what zoom you're using); if a branch is less than 1/50th of this then we keep walking up the tree. In the example here, the branch for B.1.1 has length 1 and 1<563/50 so we try the next branch, B.1, which is 12 and 12>563/50 so we stop there.

New behavior:

For trees viewed with the divergence metric and with mutations defined, it is sensible (and easy) for the zoom back button ("-" button) to walk up the tree until a branch with mutations is found. This will behave nicely with the common scenario of polytomies encoded by a bifurcating tree (which often happens during coalescent reconstruction in augur refine).

The original heuristic remains for those trees where mutations are not defined on the branches. There are cleverer ways to infer the correct threshold here if we want to, but I don't think it's worth it at the moment.

Closes #1529

@nextstrain-bot nextstrain-bot temporarily deployed to auspice-bug-zoom-out-u9mvzhd2j September 21, 2022 02:27 Inactive
For trees viewed with the divergence metric and with mutations defined,
it is sensible (and easy) for the zoom back button ("-" button) to
walk up the tree until a branch with mutations is found. This will
behave nicely with the common scenario of polytomies encoded by
a bifurcating tree (which often happens during coalescent reconstruction
in augur refine).

The original heuristic remains for those trees where mutations are not
defined on the branches. There are cleverer ways to infer the correct
threshold here if we want to, but I don't think it's worth it at the
moment.

Closes #1529
@nextstrain-bot nextstrain-bot temporarily deployed to auspice-bug-zoom-out-u9mvzhd2j September 22, 2022 02:35 Inactive
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, thanks for fixing it up!
I just left a question for my own understanding 🙏

while (isWithinBranchTolerance(node, potentialNode, distanceMeasure)) {
if (potentialNode === potentialNode.parent) break; // root node of tree
potentialNode = potentialNode.parent;
}
return potentialNode;
};

function areNucleotideMutationsPresent(observedMutations) {
const mutList = Object.keys(observedMutations);
for (let idx=mutList.length-1; idx>=0; idx--) { // start at end, as nucs come last in the key-insertion order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to where the key-insertion order can be found? (I'm still trying to piece everything together 😅 )

From what I can find, it looks like the order is dependent on keys in n.branch_attrs.mutations

const collectObservedMutations = (nodesArray) => {
const mutations = {};
nodesArray.forEach((n) => {
if (!n.branch_attrs || !n.branch_attrs.mutations) return;
Object.entries(n.branch_attrs.mutations).forEach(([gene, muts]) => {
muts.forEach((mut) => {
mutations[`${gene}:${mut}`] ? mutations[`${gene}:${mut}`]++ : (mutations[`${gene}:${mut}`] = 1);
});
});
});
return mutations;
};

Which I think is based on the order of insertion within augur export?
(boo, cross-repo code snippets don't work)

    def _transfer_mutations(node, raw_data):
        if "aa_muts" in raw_data or "muts" in raw_data:
            node["branch_attrs"]["mutations"] = {}
            if "muts" in raw_data and len(raw_data["muts"]):
                node["branch_attrs"]["mutations"]["nuc"] = raw_data["muts"]
            if "aa_muts" in raw_data:
                aa = {gene:data for gene, data in raw_data["aa_muts"].items() if len(data)}
                node["branch_attrs"]["mutations"].update(aa)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually know - but I would presume you're right here. This means it's related to nextstrain/augur#1052, which if we move to OrderedDict will make "nuc" come first for GenBank files (created by load_features, since "source" comes before CDSs in genbank files) and last for VCF files (as the translate code adds them explicitly). (That's from reading the code, I haven't double checked everything.)

In testing for this PR the "nuc" block consistently came last. In terms of code performance it's not overly important, but it seemed sensible to iterate this way. I'll make a note on the augur issue to revisit this auspice code in the future.

Copy link
Member Author

@jameshadfield jameshadfield Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A further note: it would be more efficient to calculate and store a "areNucleotideMutationsPresent" flag in a redux store rather than re-calculating it each time one presses the "goBack" button. I decided that the slight performance hit (not noticeable in my testing) was preferable to the overhead of creating this state & keeping it in sync as trees change etc.

@corneliusroemer
Copy link
Member

How can I test this on a dataset where I've encountered this bug? None of them are included as test datasets. Can I drag and drop? Would be nice if test builds were like auspice.us :)

Or is it easy to push so that any test dataset appears there? What's the bucket?

@jameshadfield
Copy link
Member Author

How can I test this on a dataset where I've encountered this bug?

This prompted #1558 - apologies that testing wasn't easier here. I'm going to merge & release now as I've just re-tested things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

BUG: - lens sometimes zooms out 2 steps rather than 1
4 participants