Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup: Add date annotations for rare genotypes #38

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions ingest/defaults/annotations.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ AF266286 genotype_ncbi A
# WHO genotype reference strains
# Information from https://www.who.int/publications/i/item/WER8709
# Dates are retrieved from epi-weeks reported within strain names
# Dates are defined as the first day of the epi-week
AF045212 is_reference TRUE
AF045217 is_reference TRUE
AF079555 is_reference TRUE
Expand Down Expand Up @@ -146,3 +147,26 @@ U64582 date 1988-XX-XX
X84865 date 1994-XX-XX
X84872 date 1990-XX-XX
X84879 date 1971-XX-XX
#
# Strains with rare genotypes
# Dates are retrieved from epi-weeks reported within strain names on NCBI
# Dates are defined as the first day of the epi-week
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

"first day" is somewhat ambiguous — could be Sunday, could be Monday… Better be explicit.

Suggested change
# Dates are defined as the first day of the epi-week
# Dates are defined as the Monday of the epi-week

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this needs to be more explicit. There are many different definitions for epi-weeks, and so the most precise wording for what I did would be "Dates are defined as the first day of the ISO epi-week, which is always a Monday". I can add this info to the annotations.tsv file. It also may be worth discussing whether there is a better approach for defining dates from epi-weeks reported in measles strain names. I started a discussion about this in slack.

# These are force-included in the nextclade tree to boost representation of rare genotypes
AF410989 genotype_ncbi E
AY037009 genotype_ncbi G2
AY037043 genotype_ncbi H2
AY037026 genotype_ncbi H2
AY037028 genotype_ncbi D2
FJ668380 genotype_ncbi D10
AF410989 strain MVi/Montreal.CAN/11.87
AY037009 strain MVs/California.USA/24.00[G2]
AY037043 strain MVi/Alaska.USA/16.00[H2]
AY037026 strain MVi/Minnesota.USA/13.97[H2]
AY037028 strain MVi/New York.USA/11.00[D2]
FJ668380 strain MVi/London.GBR/7.03[D10]
AF410989 date 1987-03-09
AY037009 date 2000-06-12
AY037043 date 2000-04-17
AY037026 date 1997-03-24
AY037028 date 2000-03-13
FJ668380 date 2003-02-10