Make nextclade tree #28

kimandrews · 2024-04-30T21:33:07Z

Description of proposed changes

This workflow creates a tree that can be used as part of a Nextclade dataset to assign genotypes to measles samples based on criteria outlined by the WHO

The tree created by this workflow includes N450 sequences for 28 reference strains defined by the WHO, along with other representative strains for each genotype that are obtained from the ingest output, using genotype assignments reported on NCBI. Genotypes are then assigned to each node with augur clades, using clade-defining mutations in defaults/clades.tsv.

Details regarding the workflow are described in a README.md

Related issue(s)

Checklist

Checks pass

Creates a nextclade directory and copies relevant files from phylogenetic directory to nextclade directory. Subsequent commits will change these for the Nextclade Dataset tree workflow.

trvrb · 2024-05-01T23:31:23Z

This is wonderful @kimandrews! Cleanly constructed in terms of the workflow and the result looks great in terms of consistency of MeV Genotypes and amount of data used. I think this will work well as a Nextclade reference tree. I just made a few changes to the Auspice config for the Nextclade tree in the above commit.

joverlee521

Thanks for working through this Kim! The overall workflow looks good to me, I only left some minor comments.

A couple high level thoughts (not blocking this PR):

Will you be updating the workflow to create the full Nextclade dataset in the future? E.g. mpox's nextclade workflow packages the whole dataset.
Since there are a lot of shared files between phylogenetic and nextclade workflows, we could consider using a shared directory to hold these files, so we don't need to duplicate the files across two directories.

nextclade/rules/prepare_sequences.smk

nextclade/rules/annotate_phylogeny.smk

genehack

Extreme nitpick...

genehack · 2024-05-01T23:49:00Z

nextclade/rules/export.smk

+"""
+
+rule export:
+    """Exporting data files for for auspice"""


Suggested change

"""Exporting data files for for auspice"""

"""Exporting data files for auspice"""

Fixed in df7c7f0

trvrb · 2024-05-01T23:54:37Z

Will you be updating the workflow to create the full Nextclade dataset in the future? E.g. mpox's nextclade workflow packages the whole dataset.

Packaging up into a Nextclade dataset will be super useful. Would be nice to have this on the clades.nextstrain.org list.

Since there are a lot of shared files between phylogenetic and nextclade workflows, we could consider using a shared directory to hold these files, so we don't need to duplicate the files across two directories.

I personally don't mind the code duplication in this particular case. Easier for me to think about these as separate things. But it's not a strong preference.

jameshadfield

Nice work @kim, only a few minor requests from me. I didn't run this locally, but tested out the tree that it produces by using the use-nextclade-dataset branch which can also be loaded into the nextclade web UI and it's looking great.

nextclade/defaults/config.yaml

nextclade/defaults/colors.tsv

nextclade/rules/prepare_sequences.smk

nextclade/rules/annotate_phylogeny.smk

* Add rule to align sequences to N450 reference using Nextclade, following edbefd5 * Filter to produce a sample set appropriate for a Nextclade Dataset tree

Remove construct_phylogeny.smk wildcards that were needed for the Phylogenetic workflow but are not needed for the Nextclade Dataset tree workflow.

* Use `augur ancestral` option `--root-sequence` to add mutations from the reference sequence to the root of the tree, which is required for Nextclade Dataset trees, as described here: https://github.com/nextstrain/nextclade_data/blob/577696e4ee0f0dff925d3911d27351a7395fab3d/docs/dataset-creation-guide.md * Use `augur clades` to assign genotypes to nodes * Remove annotate_phylogeny.smk wildcards that were needed for the Phylogenetic workflow but are not needed for the Nextclade Dataset tree workflow.

* Include `augur clades` genotype assignments in exported tree and in coloring options * Remove export.smk wildcards that were needed for the Phylogenetic workflow but are not needed for the Nextclade Dataset tree workflow.

This makes a few various updates to the Auspice config for the Nextclade tree including: 1. Making "clade_membership" default coloring. For this tree, we really care about MeV Genotype more than other colorings. 2. Call this "MeV Genotype (Nextstrain)". I think important to specify MeV Genotype to distinguish from the more common SNP-level genotype. 3. Clarify with "MeV Genotype (GenBank metadata)". The previous "(NCBI)" seemed like this could be a MeV Genotype call from an NCBI tool. 4. Order as Region then Country to follow ncov 5. Include clade_membership as filter option

joverlee521

Latest changes look good to me 👍

trvrb · 2024-05-09T20:17:21Z

Looks like this may be ready to merge? @jameshadfield: Could you confirm?

kimandrews · 2024-05-09T21:34:43Z

Looks like this may be ready to merge? @jameshadfield: Could you confirm?

I talked with @jameshadfield yesterday and we are good to merge.

Copy files from phylogenetic to nextclade

6b9a688

Creates a nextclade directory and copies relevant files from phylogenetic directory to nextclade directory. Subsequent commits will change these for the Nextclade Dataset tree workflow.

kimandrews requested a review from a team April 30, 2024 21:57

joverlee521 self-requested a review May 1, 2024 21:01

trvrb approved these changes May 1, 2024

View reviewed changes

joverlee521 approved these changes May 1, 2024

View reviewed changes

nextclade/rules/prepare_sequences.smk Outdated Show resolved Hide resolved

nextclade/rules/prepare_sequences.smk Outdated Show resolved Hide resolved

nextclade/rules/prepare_sequences.smk Show resolved Hide resolved

nextclade/rules/annotate_phylogeny.smk Show resolved Hide resolved

genehack reviewed May 1, 2024

View reviewed changes

jameshadfield requested changes May 2, 2024

View reviewed changes

nextclade/defaults/config.yaml Show resolved Hide resolved

nextclade/defaults/colors.tsv Show resolved Hide resolved

nextclade/rules/prepare_sequences.smk Show resolved Hide resolved

nextclade/rules/annotate_phylogeny.smk Show resolved Hide resolved

kimandrews mentioned this pull request May 3, 2024

Package the Nextclade dataset #31

Open

kimandrews and others added 6 commits May 6, 2024 17:05

Prepare sequences for Nextclade Dataset tree

449aec0

* Add rule to align sequences to N450 reference using Nextclade, following edbefd5 * Filter to produce a sample set appropriate for a Nextclade Dataset tree

Construct phylogeny for Nextclade Dataset tree

5e76b31

Remove construct_phylogeny.smk wildcards that were needed for the Phylogenetic workflow but are not needed for the Nextclade Dataset tree workflow.

Export auspice json for Nextclade Dataset tree

df7c7f0

* Include `augur clades` genotype assignments in exported tree and in coloring options * Remove export.smk wildcards that were needed for the Phylogenetic workflow but are not needed for the Nextclade Dataset tree workflow.

Add README.md

ba4c9be

kimandrews force-pushed the make-nextclade-tree branch from cabf98d to 07bf737 Compare May 7, 2024 21:30

joverlee521 approved these changes May 8, 2024

View reviewed changes

Update Changelog

1fd8aec

kimandrews merged commit 21d5a9c into main May 9, 2024
32 checks passed

kimandrews deleted the make-nextclade-tree branch May 9, 2024 21:35

kimandrews mentioned this pull request Jun 10, 2024

Fixup: Add date annotations for rare genotypes #38

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make nextclade tree #28

Make nextclade tree #28

kimandrews commented Apr 30, 2024

trvrb commented May 1, 2024 •

edited

Loading

joverlee521 left a comment

genehack left a comment

genehack May 1, 2024

kimandrews May 7, 2024

trvrb commented May 1, 2024 •

edited

Loading

jameshadfield left a comment

joverlee521 left a comment

trvrb commented May 9, 2024

kimandrews commented May 9, 2024

	"""Exporting data files for for auspice"""
	"""Exporting data files for auspice"""

Make nextclade tree #28

Make nextclade tree #28

Conversation

kimandrews commented Apr 30, 2024

Description of proposed changes

Related issue(s)

Checklist

trvrb commented May 1, 2024 • edited Loading

joverlee521 left a comment

Choose a reason for hiding this comment

genehack left a comment

Choose a reason for hiding this comment

genehack May 1, 2024

Choose a reason for hiding this comment

kimandrews May 7, 2024

Choose a reason for hiding this comment

trvrb commented May 1, 2024 • edited Loading

jameshadfield left a comment

Choose a reason for hiding this comment

joverlee521 left a comment

Choose a reason for hiding this comment

trvrb commented May 9, 2024

kimandrews commented May 9, 2024

trvrb commented May 1, 2024 •

edited

Loading

trvrb commented May 1, 2024 •

edited

Loading