Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisions part 2 #4

Merged
merged 41 commits into from
Feb 8, 2023
Merged

Revisions part 2 #4

merged 41 commits into from
Feb 8, 2023

Conversation

huddlej
Copy link
Collaborator

@huddlej huddlej commented Feb 1, 2023

No description provided.

Replaces manually assigned clades with automated clade assignment based
on mutations at sites previously associated with antigenic drift (Koel
et al. 2013). This approach mimics how clades are assigned in practice,
assigning new clade names when at least one epitope mutation occurs in a
clade that has circulated at some minimum global frequency. Clade
assignments almost always rely exclusively on genetic data, since the
corresponding antigenic data require much more time to produce.

This automated approach allows clades to be systematically reproduced
and also tuned by adjusting the list of epitope sites and clade minimum
frequency in the workflow configuration file.
We don't filter sequences by date, so remove the date filters from the
configuration file.
Corrects the date fields for two early samples that were missing year
information even though we know which years those samples were
collected. Updates the data curation guide to reflect how to automate
these corrections.
Append an incremental integer to recurrent clade names, so we can
distinguish them in Auspice.
Adds logic to prevent tips of the tree from being assigned their own
clade label.
Expands the x-axis label for the measurements panel to clarify that the
values displayed in the panel are normalized distances from the
reference strain rather than standard titer values.
Pushes the new tree with corrected rooting after fixing metadata for two
samples. This tree and measurements panels also reflect new clade
assignments where logic was refined to exclude tips in the tree from
being considered their own clades.
Also, updates the x-axis labels for Fig 1C and D to more clearly
describe the data being plotted.
Updates Fig 1C and D's vertical lines to match how Auspice renders these
lines as solid for all thresholds.
Modifies the workflow to use the new `--threshold` interface in `augur
measurements export` to add vertical lines at x=0 and x=2 in the
measurements JSON.
Describes the specific effect of normalization in the methods. Also,
removes a line about antigenic data in clade assignment that isn't
relevant to the section.
huddlej and others added 14 commits February 3, 2023 11:17
Updates figures to reflect new clade assignments and new features of the
measurements panel including the multiple vertical lines and the
grouping of points by their coloring value.
Both vertical lines in panels C and D are solid lines now.
Adds a custom script and workflow rule that creates a measurements
configuration JSON with predefined orderings for each grouping column
such that grouping values are sorted by the clade of the reference
strain in descending order of the earliest reference strain. This logic
effectively orders reference strains (and all other grouping labels) by
the order that clades appear in the tree such that the latest clades
appear at the top of the measurements panel. Within each clade, grouping
labels are ordered by the reference strain collection date in descending
order so the latest references in the latest clade appear at the top.
Updates the text for case study 2 to reflect our new analysis with
automated clade definitions and a focus on relatively newer data from
clade 158N/189K. Although the details of alleles and measurements have
changed for this case study, the original benefits of inspecting raw
measurements and coloring by viral attributes like genotype remain.
This still wasn't as clear as it could have been, so these changes make
the prior visualizations less ambiguous by specifying them.
Adds a description of how we order the grouping labels in the analysis
for the paper with a configuration file and rewords the description of
ordering in Results and Figure 2 caption to refer to the measurements
export command's configuration file instead of the sidecar JSON output
itself. This change minimizes the reference to "sidecar JSON" which is a
bit of Nextstrain jargon.
Adds lines to Figure 4 between reference strains in the tree and the
measurements panels on the right, to annotate the phylogenetic context
of these strains. Updates the caption for Figure 4 to reflect the new
clades and results represented in the figure.
Updates the results text for case study 1 and the corresponding caption
in Figure 3 to reflect the new clades and measurements for those clades.
@huddlej huddlej marked this pull request as ready for review February 7, 2023 05:36
Updates Augur to reflect latest measurements functionality and removes a
keyword argument to `read_node_data` that no longer exists.
Adds the version of Auspice that reflects all the features we mention in
the paper.
@huddlej huddlej merged commit dc433a2 into main Feb 8, 2023
@huddlej huddlej deleted the revisions-part-2 branch February 8, 2023 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants