Skip to content

Releases: nextstrain/nextclade_data

2022-08-09

10 Aug 13:16
Compare
Choose a tag to compare

New dataset version (tag 2022-08-09T12:00:00Z)

All Monkeypox datasets

The datasets now include hMPXV-1 lineages B.1.1 to B.1.5. See details in nextstrain/mpox#95

Sequences released to Genbank up to 2022-08-08 have been included in the new trees.

A B.1.5 sequence from Genbank has been added to the example sequences

MPXV (All clades)

Sequence KJ642615 (W-Nigeria/1971) has been excluded as it appears to be recombinant of clade 2 and clade 3. See details in nextstrain/mpox#102 - this sequence is not present in the other datasets, so no change there

Experimental, SARS-CoV-2 dataset relative to BA.2 (sars-cov-2-21L)

This release includes a new type of SARS-CoV-2 dataset that is recommended for web use only.

It uses the Wuhan reference but with the SNPs that occur in BA.2.

This way, the mutation view is less overloaded and individual Spike mutations are easier to spot by eye.

Only lineages that descend from BA.2, BA.4 or BA.5 are included in this dataset.

Please do not use this dataset for tools that rely on data continuity as the dataset is comparatively new and brittle and may not be maintained indenfinitely.

The current version has the tag 2022-07-26T12:00:00 and name sars-cov-2-21L

2022-07-27

26 Jul 23:01
Compare
Choose a tag to compare

Influenza Yamagata HA

Bug fix release (tag 2022-07-27T12:00:00Z)

Fix: The old tree used an incorrect genemap which caused Nextclade to crash. Now it works again.

Beware that Nextclade v2.0.0 until v2.3.0 have had a bug that means this dataset will crash.

You will have to upgrade to Nextclade v2.3.1 or use Nextclade v1 to use this dataset.

2022-07-26

26 Jul 13:06
Compare
Choose a tag to compare

SARS-CoV-2 and SARS-CoV-2-no-recomb

Bug fix release (tag 2022-07-26T12:00:00Z)

Fix: Ancestral reconstruction of mutations was wrong due to recombinants attaching directly to the root and causing the root mutations to be different from Wuhan.

This caused:

  • displayed mutations in Auspice to be wrong for all tips since around the time recombinants were first included in the tree (since 2022-03-24T12:00:00Z)
  • Some of the calculated reconstructed mutations on recombinants to be wrong, affecting nearest neighbor placement of some recombinants.

The fix will cause a few recombinants to become recombinants and improve QC values of some recombinants but should not have large effects overall.
The biggest perceived impact will be that mutations displayed by Auspice will now be correct.

2022-07-22

22 Jul 22:03
Compare
Choose a tag to compare

SARS-CoV-2 and SARS-CoV-2-no-recomb

New dataset version (tag 2022-07-22T12:00:00Z)

  • Clades: BA.2.75 has been given the Nextstrain clade name 22D. Read more about the reasoning for the decision to give this lineage a name here nextstrain/ncov#984
  • Data update: New pango lineages are included up to commit cov-lineages/pango-designation@65cb2e0...4213460)
  • Fix: BA.2.38 no longer contains 6091T as defining mutation, should therefore catch many more Indian BA.2.38 (report by @silcn in nextstrain/nextclade#935)
  • Fix: Genemap format now correct, compliant with GFF3, see #33 (report by @huddlej)
  • virus_properties.json has been updated, including clade 22D

2022-07-12

22 Jul 22:02
Compare
Choose a tag to compare

2022-07-12

SARS-CoV-2

New dataset version (tag 2022-07-12T12:00:00Z)

  • Fix: BA.2.75 lacked the characteristic S:R493Q reversion in the previous release, this is now fixed. This is the only change, otherwise this dataset is identical to 2022-07-11T12:00:00Z.

2022-07-11

12 Jul 14:49
Compare
Choose a tag to compare

2022-07-11

SARS-CoV-2

New dataset version (tag 2022-07-11T12:00:00Z)

  • Pango lineages: In this release, Nextclade can assign Pango lineages up to BA.2.75
  • Alignment params: Retry reverse complement flag is now set to true, so that reverse complement is tried if seed matching fails.
  • Fixes: Some synthetic pango lineage sequences had wrong mutations, this is now fixed through a manually curated override file.

2022-06-29

MPXV B.1

New dataset version (tag 2022-06-29T12:00:00Z)

  • Increased number of B.1 samples from ~100 to ~200 to improve phylogenetic placement of analyzed 2022 outbreak sequences

2022-06-14

16 Jun 16:51
ba3688a
Compare
Choose a tag to compare

3 Monkeypox (MPXV) datasets introduced

Three MPXV datasets are added with differing zoom levels containing:

  • MPXV (All clades)
  • hMPXV-1 (part of clade 3, source of 2017/2018/2022 outbreaks)
  • hMPXV-1 B.1 (2022 outbreak lineage)

All 3 use the coordinate system of the recently designated NCBI Monkeypox reference sequence NC_063383 (MPXV-M5312_HM12_Rivers).

However, SNPs from two different ref sequences are added to the "all clades" and B.1 datasets to reduce the number of total mutations.

The B.1 dataset uses SNPs of ON563414.3 (MPXV_USA_2022_MA001) on top of a NC_063383 backbone.

The "all clades" build uses the SNPs of a reconstructed ancestral MPXV sequence that is the inferred most recent common ancestor of clades 1, 2 and 3, rooted with a Cowpox outgroup.

Only the MPXV (All clades) dataset can assign all clades 1, 2 and 3.
The hMPXV-1 dataset can be used if all viruses are from hMPXV-1.
The B.1 dataset is useful for 2022 outbreak sequences but will not be able to assign anything but B.1 lineages.

Gene annotations follow the annotation used by NC_063383 and is of the form OPG001 (for OrthoPox Gene 001).
Since the alignment reference is always in NC_063383 coordinates, nucleotide and protein mutation position should usually be identical in alignments done with all three datasets.

Quality control parameters are subject to change, especially since "known" frame shifts and stop codons have not been annotated. For example, clade 1 sequences will always show around 7 frame shifts, yet these do not indicate quality problems.

New dataset version (tag 2022-06-14T12:00:00Z)

SARS-CoV-2

  • Pango lineages: New lineages added up till pango-designation release v1.9 and beyond are now included, including among others BA.5.1-BA.5.3, BA.2.35-BA.2.48 and XV-XY

2022-04-28

28 Apr 20:10
Compare
Choose a tag to compare

New dataset version (tag 2022-04-28T12:00:00Z)

SARS-CoV-2 (with and without recombinants)

  • Pango lineages: New lineages added up till pango-designation release v1.8 are now included, including among others BA.3.1, BA.2.14-BA.2.34 and XT-XU (in the default build, excluded from special "without recombinants" dataset).
  • Clades: New Nextstrain clades included. BA.4 is 22A (Omicron), BA.5 is 22B (Omicron) and BA.2.12.1 is 22C (Omicron).

2022-04-08

12 Apr 13:06
Compare
Choose a tag to compare

New dataset version (tag 2022-04-08T12:00:00Z)

SARS-CoV-2 (with and without recombinants)

  • Pango lineages: New lineages added up till pango-designation release v1.4 are now included, including among others BA.4-5, BA.2.9-BA.2.13 and XM-XS (in the default build, excluded from special "without recombinants" dataset). For now, BA.4-5 are included in Nextstrain clade 21L, together with BA.2 which is the most similar Omicron.
  • Reference tree: The first 100 and last 200 sites (with respect to Wuhan reference) are now masked in the reference tree to reduce noise due to sites like 21 that were artifactually polymorphic.

2022-03-31

31 Mar 14:33
Compare
Choose a tag to compare

New dataset version (tag 2022-03-31T12:00:00Z)

SARS-CoV-2 (with and without recombinants)

  • Pango lineages: New lineages added up till pango-designation release v1.2.137 are now included, including among others BA.1.18-19, BA.2.4-BA.2.8 and XG-XK (in the default build, excluded from special "without recombinants" dataset).
  • Dataset: The sampling of sequences has changed slightly. Previously, every Nextstrain clade got around 30 random sequences belonging to this clade causing quite a bit of movement between releases. This is no longer the case. The tree is thus slightly smaller. The change is most noticeable for small Nextstrain clades like 20F.