Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refine] prune outgroup #1751

Merged
merged 5 commits into from
Feb 13, 2025
Merged

[refine] prune outgroup #1751

merged 5 commits into from
Feb 13, 2025

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Feb 11, 2025

Refactors augur refine to root trees before TreeTime is called, where possible. This allows us to easily add a --remove-outgroup flag which works only when the tree is rooted on a single strain, and the pruning of the root is performed before any temporal inference (if applicable). This approach is the one recommended by @rneher, @emmahodcroft and @huddlej in #340.

@joverlee521 I explored patching seasonal-flu to incorporate this however the structure there is a little complicated. Specifically, if the prune_reference rule is dropped in lieu of --remove-outgroup in the refine step then the reference sequences may/will be removed via the sanitize_trees step (which is before refine) as the references (often? always?) differ across segments.

Closes #340

Copy link

codecov bot commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 72.09302% with 12 lines in your changes missing coverage. Please review.

Project coverage is 73.25%. Comparing base (e6df775) to head (3d2ef06).
Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
augur/refine.py 72.09% 8 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1751      +/-   ##
==========================================
+ Coverage   73.11%   73.25%   +0.14%     
==========================================
  Files          79       79              
  Lines        8350     8376      +26     
  Branches     1704     1706       +2     
==========================================
+ Hits         6105     6136      +31     
+ Misses       1958     1954       -4     
+ Partials      287      286       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jameshadfield jameshadfield force-pushed the james/refine-prune-outgroup branch from 701d6ee to a914d3f Compare February 11, 2025 20:44
@huddlej
Copy link
Contributor

huddlej commented Feb 11, 2025

Thank you for making the ideas in #340 a reality here, @jameshadfield! I see what you mean about the complexity in seasonal-flu. You're right that the reference ids differ between segments right now, because we use sequence accession as the id.

Even if we renamed the reference ids to the strain name (which will be the same for all segments within a lineage), a better strategy might be to force sanitize_trees.py to keep the reference node through a new command line argument much like we prevent the earlier outlier flagging step from removing specific strains. The sanitize step will then produce outputs with the same strains across segments except for the reference strains which will get removed after rooting by refine. I would expect the tree.nwk after refine in this new workflow to be the same for downstream steps as the tree produced by the current workflow.

The other potential issue is that the flag_outliers script assumes that the input tree has already been rooted, but we wouldn't be rooting the tree until late at refine. This script has a --reroot flag, though, which we could add to the existing rule without reorganizing any other rules. Would it be helpful for me to mock up what the workflow would look like based on what I described above?

@jameshadfield
Copy link
Member Author

I've shifted the seasonal-flu specific conversation to nextstrain/seasonal-flu#212

@j23414
Copy link
Contributor

j23414 commented Feb 12, 2025

Testing with WNV

Awesome work! From a user's perspective, I tested this PR by attempting to root the WNV/lineage-1A tree using a lineage-1B outgroup and pruning it.

I included a lineage-1B outgroup strain (KX394399), which was later pruned out using the new --remove-outgroup refine flag. The result: rooting for the WNV/lineage-1A tree looks significantly improved! And it only required minimal workflow changes.

Thank you! No blockers from me from a user perspective.

Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly reviewed for my own learning since I've been looking at the refine code recently with @kimandrews. Only left non-blocking comments.

We now root the tree outside of treetime (if possible) before
instantiating a `TreeTime` / `TreeAnc` object which reduces code
duplication and simplifies the logic.

There should be no behavioural changes with this commit except for
improved error messages.
jameshadfield added a commit that referenced this pull request Feb 12, 2025
InvalidUseOfRemoveOutgroup error handling suggested by @joverlee521 in
<#1751 (comment)>
@jameshadfield jameshadfield force-pushed the james/refine-prune-outgroup branch from a914d3f to 4c12c96 Compare February 12, 2025 19:56
AugurError doesn't result in a traceback, which is what we want here.
Also took the chance to improve the text slightly
InvalidUseOfRemoveOutgroup error handling suggested by @joverlee521 in
<#1751 (comment)>
@jameshadfield jameshadfield force-pushed the james/refine-prune-outgroup branch from 4c12c96 to 826859d Compare February 12, 2025 20:02
@jameshadfield jameshadfield marked this pull request as ready for review February 12, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include option to prune outgroup in augur tree or augur refine
4 participants