-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make nextclade tree #28
Conversation
Creates a nextclade directory and copies relevant files from phylogenetic directory to nextclade directory. Subsequent commits will change these for the Nextclade Dataset tree workflow.
This is wonderful @kimandrews! Cleanly constructed in terms of the workflow and the result looks great in terms of consistency of MeV Genotypes and amount of data used. I think this will work well as a Nextclade reference tree. I just made a few changes to the Auspice config for the Nextclade tree in the above commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through this Kim! The overall workflow looks good to me, I only left some minor comments.
A couple high level thoughts (not blocking this PR):
- Will you be updating the workflow to create the full Nextclade dataset in the future? E.g. mpox's nextclade workflow packages the whole dataset.
- Since there are a lot of shared files between phylogenetic and nextclade workflows, we could consider using a shared directory to hold these files, so we don't need to duplicate the files across two directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extreme nitpick...
nextclade/rules/export.smk
Outdated
""" | ||
|
||
rule export: | ||
"""Exporting data files for for auspice""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Exporting data files for for auspice""" | |
"""Exporting data files for auspice""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in df7c7f0
Packaging up into a Nextclade dataset will be super useful. Would be nice to have this on the clades.nextstrain.org list.
I personally don't mind the code duplication in this particular case. Easier for me to think about these as separate things. But it's not a strong preference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @kim, only a few minor requests from me. I didn't run this locally, but tested out the tree that it produces by using the use-nextclade-dataset branch which can also be loaded into the nextclade web UI and it's looking great.
* Add rule to align sequences to N450 reference using Nextclade, following edbefd5 * Filter to produce a sample set appropriate for a Nextclade Dataset tree
Remove construct_phylogeny.smk wildcards that were needed for the Phylogenetic workflow but are not needed for the Nextclade Dataset tree workflow.
* Use `augur ancestral` option `--root-sequence` to add mutations from the reference sequence to the root of the tree, which is required for Nextclade Dataset trees, as described here: https://github.com/nextstrain/nextclade_data/blob/577696e4ee0f0dff925d3911d27351a7395fab3d/docs/dataset-creation-guide.md * Use `augur clades` to assign genotypes to nodes * Remove annotate_phylogeny.smk wildcards that were needed for the Phylogenetic workflow but are not needed for the Nextclade Dataset tree workflow.
* Include `augur clades` genotype assignments in exported tree and in coloring options * Remove export.smk wildcards that were needed for the Phylogenetic workflow but are not needed for the Nextclade Dataset tree workflow.
This makes a few various updates to the Auspice config for the Nextclade tree including: 1. Making "clade_membership" default coloring. For this tree, we really care about MeV Genotype more than other colorings. 2. Call this "MeV Genotype (Nextstrain)". I think important to specify MeV Genotype to distinguish from the more common SNP-level genotype. 3. Clarify with "MeV Genotype (GenBank metadata)". The previous "(NCBI)" seemed like this could be a MeV Genotype call from an NCBI tool. 4. Order as Region then Country to follow ncov 5. Include clade_membership as filter option
cabf98d
to
07bf737
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes look good to me 👍
Looks like this may be ready to merge? @jameshadfield: Could you confirm? |
I talked with @jameshadfield yesterday and we are good to merge. |
Description of proposed changes
This workflow creates a tree that can be used as part of a Nextclade dataset to assign genotypes to measles samples based on criteria outlined by the WHO
The tree created by this workflow includes N450 sequences for 28 reference strains defined by the WHO, along with other representative strains for each genotype that are obtained from the ingest output, using genotype assignments reported on NCBI. Genotypes are then assigned to each node with
augur clades
, using clade-defining mutations indefaults/clades.tsv
.Details regarding the workflow are described in a README.md
Related issue(s)
Checklist