Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into feat/hadv-a
Browse files Browse the repository at this point in the history
  • Loading branch information
ivan-aksamentov committed Feb 1, 2022
2 parents fe81e75 + b2793fc commit 42bfe20
Show file tree
Hide file tree
Showing 96 changed files with 2,555 additions and 780 deletions.
80 changes: 75 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,77 @@
## Nextclade Web 1.13.1, Nextclade CLI 1.10.2, Nextalign CLI 1.10.2 (2022-02-01)

This is a bug fix release.

### [Fix] Exclude reversions of deletions from consideration in "SNP clusters" QC rule

Since introduction of reversions in Nextclade Web 1.13.0 and Nextclade CLI 1.10.0, "SNP clusters" QC rule have been including reversions of deletions when counting clustered private mutations. This was unexpected and produced false-positives for some of the sequences. To fix that, we removed the reversions of deletions from consideration of this QC rule, so that it behaves as previously.

### [Fix] Center markers in sequence view in Nextclade Web

In this version we improved the display of various colored markers (mutations, ranges etc.) in sequence and peptide views on the Results page of Nextclade Web. The individual markers are now centered around their position in the sequence (previously left-aligned). Although markers have moved by just a few pixels, this makes positioning more consistent, and ensures that different types of markers are correctly aligned across table rows.


## Nextclade CLI 1.10.1 (2022-01-26)

### [Fix] Improve error message when the virus properties file is missing [#704](https://github.com/nextstrain/nextclade/pull/704)

Since version 1.10.0 Nextclade CLI have introduced a new required input file, `virus_properties.json` and [datasets](https://github.com/nextstrain/nextclade_data/blob/master/CHANGELOG.md) and [documentation](https://docs.nextstrain.org/projects/nextclade/) were updated to match. However, users who don't use datasets might have encountered breakage due to a missing file: when running Nextclade CLI without either `--input-dataset` of `--input-virus-properties` flag provided, it would stop with an unclear error message. In this release we improve the error message, making sure that that explains the problem and offers a solution.

This does not affect Nextclade Web or Nextalign CLI.

In order to facilitate upgrades, for most users, we recommend to:

- download the latest dataset before each Nextclade CLI session (e.g. in the beginning of an automated workflow, or once you start a batch of experiments manually) using `nextclade dataset get` command
- use `--input-dataset` flag instead of individual `--input-*` flags for dataset files when issuing `nextclade run` command
- if necessary, override some of the individual input files using corresponding `--input-*` flags


### [Fix] Add information about `virus_properties.json` or `--input-virus-properties` to changelog

In the excitement of bringing the new features, we forgot to mention `virus_properties.json` or `--input-virus-properties` in the changelog when Nextclade CLI 1.10.0 was released. We now added this information retroactively.


## Nextclade Web 1.13.0, Nextclade CLI 1.10.0, Nextalign CLI 1.10.0 (2022-01-24)

### 💥 [BREAKING CHANGE] Nextclade: new required input file: `virus_properties.json` [#689](https://github.com/nextstrain/nextclade/pull/689)

This version introduces a new required input file for Nextclade, called `virus_properties.json`. This file contains additional information necessary for the "Detailed split of private mutations" feature (see below). [The new versions of Nextclade datasets](https://github.com/nextstrain/nextclade_data/blob/master/CHANGELOG.md) were released to account for this change.

How it affects different tools in the Nextclade family and how to upgrade:

- Nextclade Web - requires the new file. Migration path: no action is needed. Nextclade Web always uses the latest dataset automatically.

- Nextclade CLI - requires the new file. Migration path:

- Download the latest dataset with `nextclade dataset get` command (dataset tagged `2022-01-18T12:00:00Z` or more recent is required)
- If using `--input-dataset` flag: the new file will be be picked up automatically from the latest dataset. No further action is needed.
- If not using `--input-dataset` flag: add `--input-virus-properties` flag to pint to `virus_properties.json` file from the dataset.

- Nextalign CLI - not affected: it does not use `virus_properties.json`. Migration path: no action is needed.


### [Feature] Detailed split of private mutations (Nextclade) [#689](https://github.com/nextstrain/nextclade/pull/689)

Private mutations (differences between a query sequence and nearest neighbour in reference tree) are now split into three categories:

1. Reversion to reference genotype
2. (SARS-CoV-2 only for now) Mutation to a genotype common in at least 1 clade get labeled with that clade
3. Mutations that are neither reversions nor labeled (called "unlabeled")

Which category a mutation belongs to is visible by hovering over the "Mut." column in Nextclade Web and in various "privateNucMutations" fields in [csv/tsv/json outputs](https://docs.nextstrain.org/projects/nextclade/en/stable/user/output-files.html#tabular-csv-tsv-results).

### [Change] "Private mutations" QC rule now accounts for reversions and labeled mutations

Reversions and labeled mutations (see feature above) are particularly common in contaminated samples, coinfections and recombination. To draw the user's attention to such sequences, both types of private mutation now get higher weights in the "Private mutations" QC rule (denoted as "P" in Nextclade Web, and `qc.privateMutations` in output files).

### [Feature] Insertions now also available as amino acids [#692](https://github.com/nextstrain/nextclade/pull/692)

Aminoacid insertions in the query peptides relative to the corresponding reference peptide are now displayed in the "Ins." column in Nextclade Web and are emitted as "aaInsertions" and "totalAminoacidInsertions" fields in Nextalign and Nextclade output files. Note, that similarly to nucleotide insertions, aminoacid insertions are stripped from the output alignment.

### [Fix] Gaps in query sequences are now stripped correctly [#696](https://github.com/nextstrain/nextclade/pull/696)

When query sequences contained gaps (-), e.g. when inputting aligned sequences, gaps were not stripped correctly since v1.7.0 (web v1.10.0), which could lead to - showing up in insertions.

## Nextclade Web 1.12.0, Nextclade CLI 1.9.0, Nextalign CLI 1.9.0 (2022-01-11)

### [Feature] Handle "-" strand gene translation
Expand All @@ -24,14 +98,12 @@ The alignment algorithm in Nextclade CLI and Nextalign CLI could sometimes produ

In rare cases Nextclade and Nextalign algorithms could sometimes read past the end of arrays, which previously went undetected. This is now fixed.


## Nextclade Web 1.11.1, Nextclade CLI 1.8.1 (2022-01-07)

### [Hotfix] Nextclade CLI crashes on macOS when reading JSON tree (#680)

Fixes crash `Error: [json.exception.invalid_iterator.214] cannot get value |` when reading JSON tree on macOS


## Nextclade Web 1.11.0, Nextclade CLI 1.8.0 (2022-01-04)

### [Feature] Better dataset selector
Expand All @@ -44,7 +116,7 @@ Nextclade CLI and Nextclade Web now can assign multiple clade-like attributes to

If input reference tree JSON contains an array of attribute keys attached to the

```
```js
meta.extensions.nextclade.clade_node_attrs_keys = ["my_clades", "other_clades"]
```

Expand All @@ -65,10 +137,8 @@ The new optimized FASTA parser makes Nextclade CLI up to 60% faster and Nextalig

This is an internal fix of a problem that might have lead to a crash in rare cases, when coordinate map array was accessed beyond it's size.


## Nextclade Web 1.9.0, Nextclade CLI 1.6.0 (2021-12-07)


### [BREAKING CHANGE] [Fix] Remove unused CLI flags for aminoacid seed alignment

Seed matching step was removed in Nextalign and Nextclade CLI 1.5.0, however the command-line parameters previously providing configuration options for this step were not. In this version, the now unused family of `--aa-*` CLI flags is removed. Migration path: remove these flags from Nextclade CLI invocation.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ by Nextstrain team

If you use results obtained with Nextclade in a publication, please

- cite our preprint:
- cite our paper:

> Aksamentov, I., Roemer, C., Hodcroft, E. B., & Neher, R. A., (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67), 3773, https://doi.org/10.21105/joss.03773
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.9.0
1.10.2
9 changes: 6 additions & 3 deletions docs/dev/old-versions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@ This is for historical context and none of these should be used for anything ser
## Versions 1.x

| release date | version | URL of Vercel preview |
|--------------|---------|----------------------------------------------------|
| ------------ | ------- | -------------------------------------------------- |
| 2022-01-11 | 1.12.0 | https://nextclade-jo1iacqrs-nextstrain.vercel.app/ |
| 2022-01-07 | 1.11.1 | https://nextclade-5aj183szo-nextstrain.vercel.app/ |
| 2022-01-04 | 1.11.0 | https://nextclade-7knw3p805-nextstrain.vercel.app/ |
| 2021-12-07 | 1.9.0 | https://nextclade-gi9y6qzvr-nextstrain.vercel.app/ |
| 2021-11-29 | 1.8.1 | https://nextclade-d0eft3j74-nextstrain.vercel.app/ |
| 2021-11-25 | 1.8.0 | https://nextclade-ppgn50zv9-nextstrain.vercel.app/ |
| 2021-11-16 | 1.7.4 | https://nextclade-hq9z31tch-nextstrain.vercel.app/ |
Expand All @@ -22,11 +26,10 @@ This is for historical context and none of these should be used for anything ser
| 2021-07-11 | 1.5.2 | https://nextclade-1bpp6eq1m-nextstrain.vercel.app/ |
| 2021-07-08 | 1.5.1 | https://nextclade-lb6e55u36-nextstrain.vercel.app/ |


## Versions 0.x

| release date | version | URL of Vercel preview |
|--------------|---------|----------------------------------------------------|
| ------------ | ------- | -------------------------------------------------- |
| 2021-06-07 | 0.14.4 | https://nextclade-4u8v9zs7i-nextstrain.vercel.app/ |
| 2021-05-20 | 0.14.3 | https://nextclade-9d810h320-nextstrain.vercel.app/ |
| 2021-03-30 | 0.14.2 | https://nextclade-kocumke81-nextstrain.vercel.app/ |
Expand Down
Binary file added docs/user/assets/web_download-options.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/user/assets/web_mut-tooltip.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/user/assets/web_select-virus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/user/assets/web_show-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 9 additions & 9 deletions docs/user/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Nextclade dataset is a set of input data files required for Nextclade to run the
- quality control configuration
- gene map
- PCR primers
- virus properties (since CLI `v1.10.0` / web `v1.13.0`)

Dataset might also include example sequence data (to be analyzed).

Expand All @@ -16,9 +17,9 @@ An instance of a dataset is a directory containing the dataset files.

There are 3 concepts that are important to understand in order to work with Nextclade datasets:

1. **Dataset name** - identifies dataset purpose. Typically indicates name of the pathogen. Examples: `sars-cov-2`, `flu_h3n2_ha`. Each dataset is specific to a given virus. For example, a dataset for H1N1 flu is not suitable for analysing SARS-CoV-2 sequences and vice versa. Mixing incompatible datasets and sequences will produce incorrect results.
1. **Dataset name** - identifies dataset purpose. Typically indicates name of the pathogen. Examples: `sars-cov-2`, `flu_h3n2_ha`. Each dataset is specific to a given virus. For example, a dataset for H1N1 flu is not suitable for analyzing SARS-CoV-2 sequences and vice versa. Mixing incompatible datasets and sequences will produce incorrect results.

2. **Dataset's reference sequence**: each dataset can have multiple flavors, depending on the reference sequence it is based on. For example, one `sars-cov-2` reference dataset can be based on `MN908947 (Wuhan-Hu-1/2019)` or reference sequences, and `flu_h3n2_ha` can be based on `CY034116 (A/Wisconsin/67/2005)` or other reference sequences. For each dataset name, among all available reference sequences, there is a default reference sequence defined (by dataset maintainers). It is used when no concrete reference sequence is specified. The dataset reference is specified using the corresponding accession ID.
2. **Dataset's reference sequence**: each dataset can have multiple flavors, depending on the reference sequence it is based on. For example, one `sars-cov-2` reference dataset can be based on `MN908947 (Wuhan-Hu-1/2019)` or reference sequences, and `flu_h3n2_ha` can be based on `CY034116 (A/Wisconsin/67/2005)` or other reference sequences. For each dataset name, among all available reference sequences, there is a default reference sequence defined by the dataset maintainers. It is used when no concrete reference sequence is specified. The dataset reference is specified using the corresponding accession ID.

3. **Dataset version and version tag**: each reference dataset can have multiple versions. New versions are produced during dataset updates. Datasets are versioned to ensure correctness when running with different versions of Nextclade as well as reproducibility of results. For each reference in dataset there is exactly one latest version. It is used as a default when no version is specified. Version tag is the name unique to a given version.

Expand All @@ -30,18 +31,17 @@ A combination of (1) name, (2) reference sequence accession, (3) version tag uni

Nextclade Web loads the latest compatible datasets automatically. User can choose one of the datasets before starting the analysis using dataset selector.


<!-- The datasets page &#40;`https://clades.nextstrain.org/data`&#41; displays all the available datasets and allows to download them &#40;individual files or grouped inside a zip archive&#41;. These downloaded datasets can be used with Nextclade Web in advanced mode or with Nextclade CLI. They can also serve as a starting point for creating your own datasets. -->
The [datasets page](https://github.com/nextstrain/nextclade_data/tree/release/data/datasets) displays all the available datasets and allows to download them. These downloaded datasets can be used with Nextclade Web in advanced mode or with Nextclade CLI. They can also serve as a starting point for creating your own datasets.

### Datasets in Nextclade CLI

Nextclade CLI implements subcommands allowing to list and to download datasets. This functionality requires internet connection.
Nextclade CLI implements subcommands allowing to list and to download datasets. This functionality requires an internet connection.

#### List available datasets

The datasets can be listed with the `dataset list` subcommand:

```
```bash
nextclade dataset list --name sars-cov-2
```

Expand Down Expand Up @@ -130,9 +130,9 @@ If the `--input-dataset` flag is not used, the individual `--input-*` flags are

## Dataset versioning and compatibility

When Nextclade software implements new features (for example new QC checks) it might require dataset changes that are incompatible with the previous versions of Nextclade.
When Nextclade software implements new features (for example new QC checks) it might require dataset changes that are incompatible with previous versions of Nextclade.

Each dataset defines multiple versions, each containing a range of compatible Nextclade versions (separately for Nextclade Web and Nextclade CLI). A particular version of Nextclade can only use a dataset that has matching compatibility range.
Each dataset defines multiple versions, each containing a range of compatible Nextclade versions (separately for Nextclade Web and Nextclade CLI). A particular version of Nextclade can only use a dataset that has a matching compatibility range.

Compatibility checks are ensured by default in Nextclade Web and Nextclade CLI when downloading datasets. However, Nextclade CLI users can additionally list and download any dataset version using advanced command-line flags (see `nextclade dataset --help`).

Expand All @@ -148,7 +148,7 @@ Nextclade team hosts a public file server containing all the dataset file themse

At this time we do not support the usage of the dataset repository outside of Nextclade. We cannot guarantee stability of the index file format or of the filesystem structure. They can change without notice.

The code and source data for datasets generation is in the GitHub repository: [https://github.com/nextstrain/nextclade_data](https://github.com/nextstrain/nextclade_data)
The code and source data for datasets generation is in the GitHub repository: [nextstrain/nextclade_data_workflows](https://github.com/nextstrain/nextclade_data_workflows)

## Dataset updates

Expand Down
4 changes: 2 additions & 2 deletions docs/user/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ The rules on missing data, private mutations, and SNP clusters mimic the exclusi

### How can I contact the maintainers of Nextclade?

The Nextstrain team maintains a discussion forum at https://discussion.nextstrain.org. You can post your questions there. For software bugs, feature requests, ideas, technical questions please open an issue on [GitHub](https://github.com/nextstrain/nextclade/issues/new/choose) (requires account registration).
The Nextstrain team maintains a discussion forum at <https://discussion.nextstrain.org>. You can post your questions there. For software bugs, feature requests, ideas, technical questions please open an issue on [GitHub](https://github.com/nextstrain/nextclade/issues/new/choose) (requires account registration).

### Can I use my own reference tree?

Yes, you can specify your own tree, reference sequence, QC configuration and other parameters in the advanced mode. Your phylogenetic tree can be generated using the augur ([docs](https://docs.nextstrain.org/), [GitHub](https://github.com/nextstrain/augur)).

### Is Nextclade available for other pathogens and microorganisms, too?

Nextclade works for other viruses, but you have to specify your own reference sequences, trees, and annotations. Only SARS-CoV-2 data is currently provided as a default. We plan to support other pathogens in the future.
Nextclade works for other viruses, but you have to specify your own reference sequences, trees, and annotations. Only SARS-CoV-2 and Influenza A/B HA (H3N2,H1N1pdm,Vic,Yam) data is currently provided as a default. We plan to support other pathogens in the future.
Loading

0 comments on commit 42bfe20

Please sign in to comment.