Skip to content

Releases: nextstrain/nextclade

2.1.0

29 Jun 19:30
Compare
Choose a tag to compare

Nextclade CLI 2.1.0

  • Fix #907: If --ouput-basename contains dots, the last component is no longer omitted (report: @KatSteinke, fix: @ivan-aksamentov)

  • Fix #908: Files passed as --input-virus-properties were interpreted like passed to --input-pcr-primers and vice versa (report: @BCArg, fix: @corneliusroemer)

Commit history

(click to expand)
  • [9e98a42] docs: fix download links [skip ci]

  • [46a8a53] chore: update dependencies, rust to 1.61

  • [2290adc] chore: upgrade Cargo.toml's

  • [bb9f276] Merge pull request #905 from nextstrain/update-deps

  • [6935a9a] Fix link to bioconda package

  • [cff9f9f] Merge pull request #906 from nextstrain/victorlin/fix-doc-link

  • [29a92cb] fix: mixed up input-prc and input-virus-properties

  • [dbb0ca9] chore: add bug fix to changelog

  • [e9df843] Merge pull request #911 from nextstrain/fix-mangled-input-files

fix: mixed up input-prc and input-virus-properties

  • [d0d550a] fix(cli): prevent truncation of components if basename contains dots

Resolves: #907

This rolls an in-house version of add_extension() function which always adds an extension to a PathBuf. This is different from PathBuf::with_extension() which may replace or add extension depending on what the path is.

This solves a problem with basenames containing dots, as described in the issue: PathBuf::with_extension() thought that they are extensions and replaced the last one. But we always want to add, not replace.

  • [d5ef6bb] docs: add new cli changes to changelog

  • [249adda] docs: add recent web changes to changelog

  • [75d0ecd] Merge pull request #913 from nextstrain/fix/cli-basename-dots

  • [95f20df] Merge remote-tracking branch 'origin/master' into docs/web-changelog

  • [daa5c02] docs: fix md syntax

  • [74e7334] Merge branch 'docs/web-changelog'

  • [91e77c6] chore: release cli 2.1.0

2.0.0

28 Jun 11:03
Compare
Choose a tag to compare

Nextclade 2.0.0

Rust

Nextclade core algorithms and command-line interface was reimplemented in Rust (replacing C++ implementation).

Rust is a modern, high performance programming language that is pleasant to read and write. Rust programs have comparable runtime performance with C++, while easier to write. It should provide a serious productivity boost for the dev team.

Also, it is now much simpler to contribute to Nextclade. If you wanted to contribute, or to simply review and understand the codebase, but were scared off by the complexity of C++, then give it another try - the Rust version is much more enjoyable! Check our developer guide for getting started. We are always open for contributions, reviews and ideas!

Alignment algorithm rewritten with adaptive bands

  • Feature: Previously, the alignment band width was constant throughout a given sequence. Now, band width is adaptive: narrow where seed matches indicate no indels, wide where seed matches indicate indels.

  • Performance is improved for sequences with indels

  • Fix: Terminal alignment errors, particularly common in BA.2, are fixed due to wider default band width between terminal seed matches and sequence ends

  • Fix: More robust seed matching allows some previously unalignable sequences to be aligned

  • Fix: Terminal indels for amino acid alignments are only free if the nucleotide alignment indicates a gap. Otherwise, they are penalized like internal gaps. This leads to more parsimonious alignment results.

  • Feature: Additional alignment parameters can now be tuned:

    • "Excess band width" parameter controls the extra band width that is necessary for correct alignment if both deletions and insertions occur between two seed matches.

    • "Terminal band width" controls the extra band width that is necessary for correct alignment if terminal indels occur.

  • Feature: "Min match rate" parameter is added, which sets required rage of seed matches in a sequence (number of matched seeds divided by total number of attempted seeds). If the measured rate is below required, alignment will not be attempted, as for such sequences, there is a high chance of infeasible memory and computational requirements. The default value is 0.3.

  • Fix: 3' terminal insertions are now properly detected

  • Feature: "Retry reverse complement" alignment parameter is added. When enabled, an additional attempt of seed matching is made after initial attempt fails. The second attempt is performed on reverse-complemented sequence.

    As a consequence:

    • the output alignment, peptides and analysis results correspond to this modified sequence and not to the original
    • sequence name gets a suffix appended to it for all output files (fasta, seqName column, node name on the tree etc.)
    • in output files, there is a new field/column: isReverseComplement, which contains true if the corresponding sequence underwent reverse-complement transformation

    This functionality is opt-in and the default behavior is unchanged: skip sequence and emit a warning.

Genes on reverse (negative) strand

Nextclade now correctly handles genes on reverse (negative) strand, which is particularly important for Monkeypox virus.

Nextclade Web

  • Feature: Nextclade Web is now substantially faster, both to startup and when analysing sequences, due to general algorithmic improvements.

  • Feature: Drag&drop box for fasta files now supports multiple files. The files are concatenated in this case.

  • Feature: Sequence view and peptide views now show insertions. They are denoted as purple triangles.

  • Fix: Tree view now longer shows duplicate clade annotations

Input files

  • Fix: gene map GFF3 file now correctly accepts "gene" and "locus_tag" attributes. This should allow to use genome annotations from GeneBank with little or no modifications.

  • Feature: Nextclade now reads virus-specific alignment parameters from virus_properties.json file from the dataset. It is equivalent to passing alignment tweaks using command-line flags, but is more convenient. If a parameter is provided in both virus_properties.json and as a flag, then the flag takes precedence.

Nextclade CLI

  • Feature: BREAKING CHANGE Command-line interface was redesigned to make it more consistent and ergonomic. The following invocation should be sufficient for most users:

    nextclade run --input-dataset=dataset/ --output-all=out/ sequences.fasta

    short version:

    nextclade run -D dataset/  -O out/ sequences.fasta
    • Nextalign CLI and Nextclade CLI now require a command as the first argument. To reproduce the behavior of Nextclade v1, use nextalign run instead of nextalign and nextclade run instead of nextclade. See nextalign --help or nextclade --help for the full list of commands. Each command has it own --help menu, e.g. nextclade run --help.

    • --input-fasta flag is removed in favor of providing input sequence file names as positional arguments. Multiple input fasta files can be provided. Different compression formats are allowed:

      nextclade run -D dataset/ -O out/ 1.fasta 2.fasta.gz 3.fasta.xz 4.fasta.bz2 5.fasta.zst
    • If no fasta files provided, it will be read from standard input (stdin). Reading from stdin does not support compression.

    • If a special filename (-) is provided for one of the individual output file flags (--output-*), the corresponded output will be printed to standard output (stdout). This allows integration into Unix-style pipelines. For example:

      curl $fasta_gz_url | gzip -cd | nextclade run -D dataset/ --output-tsv=- | my_nextclade_tsv_processor
      
      xzcat *.fasta.xz | nextalign run -r ref.fasta -m genemap.gff -o - | process_aligned_fasta
    • The flag --output-all (-O) replaces --output-dir flag and allows to conveniently output all files with a single flag.

    • The new flag --output-selection allows to restrict what's being output by the --output-all flag.

    • If the --output-basename flag is not provided, the base name of output files will default to "nextclade" or "nextalign" respectively for Nextclade CLI and Nextalign CLI. They will no longer attempt to guess base file name from the input fasta.

    • The new flag --output-translations is a dedicated flag to provide a file path template which will be used to output translated gene fasta files. This flag accepts a template string with a template variable {gene}, which will be substituted with a gene name. Each gene therefore receives it's own path. Additionally, the translations are now independent from output directory and can be omitted if they are not necessary.

    Example:

    If the following is provided:

    --output-translations='output_dir/gene_{gene}.translation.fasta'

    then for SARS-CoV-2 Nextclade will write the following files:

    output_dir/gene_ORF1a.translation.fasta
    output_dir/gene_ORF1b.translation.fasta
    ...
    output_dir/gene_S.translation.fasta
    

    Make sure you properly quote and/or escape the curly braces in the variable {gene}, so that your shell, programming language or pipeline manager does not attempt to substitute the variable.

  • Feature: New --excess-bandwidth, --terminal-bandwidth, --min-match-rate, --retry-reverse-complement arguments are added (see "Alignment algorithm rewritten with adaptive bands" section for details)

  • Feature: Nextclade CLI and Nextalign CLI now accept compressed input files. If a compressed fasta file is provided, it will be transparently decompressed. Supported compression formats: gz, bz2, xz, zstd. Decompressor is chosen based on file extension.

  • Feature: Nextclade CLI and Nextalign CLI can now write compressed output files. If output path contains one of the supported file extensions, it will be transparently compressed. Supported compression formats: gz, bz2, xz, zstd.

  • Feature: Nextclade can now write outputs in newline-delimited JSON format . Use --output-ndjson flag for that. NDJSON output is equivalent to JSON output, but is not hierarchical, so it can be easily streamed and parsed one entry at a time.

  • Feature: Nextclade dataset get and dataset list commands now can fetch dataset index from a custom server. The root URL of the dataset server can be set using --server=<URL> flag.

  • Feature: Nextclade dataset get command can output downloaded dataset in the form of a zip archive, using --output-zip flag. The dataset zip is simply the dataset directory, but compressed, and it can be used as a replacement in the --input-dataset flag of the run command.

  • Feature: Nextalign CLI and Nextclade CLI provide a command for generating shell completions: see nextclade completions --help for details.

  • Feature: Verbosity of can be tuned using wither --verbosity=<severity> flag or one or multiple occurences of -v and -q flags. By default Nextclade and Nextalign show messages with severity "warn" or above (i.e. only warning and errors). Flag -v increases and flag -q decreases verbosity one step, -vv and -qq - two steps, etc.

Feedback

If you found a bug or have a suggestion, feel free to:

Read more

2.0.0-beta.9

25 Jun 16:27
Compare
Choose a tag to compare
2.0.0-beta.9 Pre-release
Pre-release
⚠️ This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases.

Commit history

(click to expand)
  • [bc9839c] feat: retry with reverse complement when seed matching fails

Adds flag --retry-reverse-complement which enables additional attempt of seed matching when initial attempt fails. The second attempt is performed on reverse-complemented sequence.

As a consequence, the output alignment, peptides and analysis results correspond to this modified sequence and not to the original.

This functionality is opt-in and the default behavior is to skip sequence with a warning.

  • [879f780] feat: append suffix to sequence if reverse complemented

  • [fa8f275] feat(cli): issue a warning when a sequence was reverse-complemented

  • [fb271be] Merge remote-tracking branch 'origin/master' into feat/reverse-if-seed-fails

  • [2605cca] feat: add warning to errors.csv when sequence gets reverse-complemented

  • [18bdc94] feat: add "isReverseComplement" columt to csv and tsv outputs

  • [7316d01] feat(cli): default basename to a consistent hardcoded value

Currently, if --output-basename is not provided, and the basename for files written to --output-all is the same is for input fasta. However, if multiple fasta files provided, it switches to a hardcoded "nextaclade" or "nextalign".

This is not something that other CLI tools typically do and might be confusing, especially for use-cases where a certain filename is expected (i.e. in scripts and pipelines), especially when a number of input fasta files is not known in advance or if it changes between 2 runs.

This PR proposes to always use a hardcoded name for consistency, so that there is no surprise.

  • [8a3f109] Merge pull request #891 from nextstrain/feat/default-basename-hardcoded

  • [78d2156] Merge pull request #887 from nextstrain/feat/reverse-if-seed-fails

  • [9efdbf9] docs: add recent changes to changelog

  • [bd692c7] chore: release cli 2.0.0-beta.9

2.0.0-beta.8

25 Jun 01:23
Compare
Choose a tag to compare
2.0.0-beta.8 Pre-release
Pre-release
⚠️ This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases.

Commit history

(click to expand)
  • [ed65bf1] feat: add sample data for hAdv-A

  • [6cf49c0] Merge remote-tracking branch 'origin/master' into feat/hadv-a

  • [fe81e75] Merge remote-tracking branch 'origin/master' into feat/hadv-a

  • [42bfe20] Merge remote-tracking branch 'origin/master' into feat/hadv-a

  • [6435118] chore: release web v2.2.0

  • [a9c6c30] feat: sort mutations, deletions and insertions

This adds sorting of mutations, deletions and insertions right after they are extracted. This should ensure that they are sorted in the the output files, which improves readability.

  • [c52e5c2] refactor: fix comment

  • [3bb8cfd] Merge pull request #886 from nextstrain/feat/sort-muts

feat: sort mutations, deletions and insertions

  • [f79230e] Merge branch 'feat/hadv-a'

  • [6df91b5] chore: speedup dev and test binaries

This enables optimizations even in dev and test mode, to some of the third-party packages that are known to be slow. This should hopefully make dev experience a bit better.

  • [3806b82] Merge pull request #889 from nextstrain/chore/speedup-dev-and-test

  • [34c788d] docs: cleanup changelog

  • [e356f13] docs: add min match rate to changelog

  • [f45ac24] fix(cli): typo

  • [f283a5f] Merge pull request #890 from nextstrain/fix/typo

  • [7ec6b81] feat(cli): make output compression faster

This:

  • reduces default output file compression levels for all formats to 2, which roughly corresponds to "fast" or "low" preset. This should ensure that outputs are not limited by compression speed in most cases.

  • allows to set compression levels per format with environment variables:

    • GZ_COMPRESSION
    • BZ2_COMPRESSION
    • XZ_COMPRESSION
    • ZST_COMPRESSION
  • [12c36ca] Merge pull request #892 from nextstrain/feat/faster-compression

  • [e0c7f17] chore: release cli 2.0.0-beta.8

2.0.0-beta.7

23 Jun 12:27
Compare
Choose a tag to compare
2.0.0-beta.7 Pre-release
Pre-release
⚠️ This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases.

Commit history

(click to expand)
  • [a2f174d] feat(cli): allow to replace unknown nucs with Ns

This adds a flag --replace-unknown to run command, which allows to replace unknown nucleotide characters with 'N'.

By default, the sequences containing unknown nucleotide nucleotide characters are skipped with a warning - they
are not aligned and not included into results. If this flag is provided, then before the alignment,
all unknown characters are replaced with 'N'. This replacement allows to align and analyze these sequences.

The following characters are considered known:

-ABCDGHKMNRSTVWY
  • [cae1f51] Merge remote-tracking branch 'origin/master' into feat/cli-replace-unknown

  • [d8f5f28] feat(cli): organize verbosity flags towards the end of help message

Currently --verbosity, --silent, -v and -q args are missorted in the --help message text.

Here I inline the clap-verbosity-flag crate (only 1 file) and modify it, adding our custom flags and display_order annotations, such that the args are shown in the very end of the message, just before the --help arg.

  • [d98312a] Merge pull request #883 from nextstrain/feat/cli-organize-verbosity-flags

  • [fc4c089] feat(cli): add headings for help sections

There are many arguments for run command, so let's organize them in named sections.

It required splitting args into separate structs and adding next_help_heading annotations.

I could not figure out how to change the "OPTIONS" heading where the default --help arguments stays. So I added a fake indentation for all heading as if they are nested under "OPTIONS".

  • [a1ff027] Merge pull request #884 from nextstrain/feat/cli-add-help-headings

  • [1aa6b0f] feat: add minimum seed matching rate

  • [d437b49] Merge pull request #885 from nextstrain/fix/avoid-large-allocations

fix: avoid large allocations during alignment

  • [7ffc959] Merge remote-tracking branch 'origin/master' into feat/cli-replace-unknown

  • [286f7f3] Merge pull request #877 from nextstrain/feat/cli-replace-unknown

  • [f7c30f4] chore: release cli 2.0.0-beta.7

2.0.0-beta.6

22 Jun 15:25
Compare
Choose a tag to compare
2.0.0-beta.6 Pre-release
Pre-release
⚠️ This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases.

Commit history

(click to expand)
  • [aa31635] chore(ci): trigger ci

  • [e27dc48] chore: add script for testing on different gnu linux distros [skip ci]

  • [3dfb00d] fix(web): rectify incorrect filed name

This was sometimes causing a crash of the web app when filtering by aminoacids. The filed queryAa was not spelled correctly and the lodash intersectionWith() function's typings did not catch the type mismatch.

  • [444ceaa] fix(cli): don't crash on unknown nucleotide characters

Nextclade v2 crashes when encounters unknown nucleotide letters which converting fasta string to the internal sequence representation.

This PR aligns behavior with v1: sequences with unknown nucleotide letters are now ignored, excluded from results, added to errors.csv and the run continues.

  • [f4627bb] Merge pull request #875 from nextstrain/fix/web-crash-on-filtering

  • [bad2f65] Merge pull request #876 from nextstrain/fix/cli-crash-on-unknown-nuc

  • [9462d8d] fix: add missing custom node attrs to tree json

This adds custom node attrs (e.g. pango lineages) to the tree json, that were previously missing.

This should ensure that they are shown on the tree viz as it was in Nextclade v1.

  • [6bc986d] fix: add missing qc status to tree json

This adds qc status to the tree json, previously missing.

This should ensure that they QC status is shown on the tree viz as it was in Nextclade v1.

  • [ae0bc80] Merge pull request #878 from nextstrain/fix/tree-missing-custom-attrs

  • [0e48641] Merge remote-tracking branch 'origin/master' into fix/tree-missing-qc-status

  • [1573e62] Merge pull request #879 from nextstrain/fix/tree-missing-qc-status

fix: add missing qc status to tree json

  • [4b9fa41] refactor: rename file to clarify intent

It will contain both compression and decompression functions

  • [78d0cd3] feat(cli): add output file compression

Adds compression support for output files: if output filename contains one of the supported extensions, the outputs will be transparently compressed. Example --output-fasta=aligned.fasta.xz. Supported formats as the same as for input decompression: gz, bz2, xz, zstd. Default compression levels are used.

To make it compile, I had to additionally:

  • change some of the lifetime parameters in CSV and NDJSON writer, because they were unnecessarily limiting

  • remove a very tricky into_inner() methods in CSV and NDJSON writer, which required Synctrait on inner writer, while zstd writer did not support that. For that, in a few places, instead of getting inner writer and getting a string out of it, I managed to just use vec as an inner writer in these places. So into_inner() method was no longer needed, same as Sync trait bound.

  • [b8abb99] Merge pull request #880 from nextstrain/feat/cli-output-compression

  • [f029502] feat(cli): improve help messages for input fasta arg

  • [5d45010] feat(cli): mention compression of outputs in cli help messages

  • [ef842bb] Merge pull request #881 from nextstrain/feat/cli-improve-help

  • [7ed87ff] docs: mention output file compression in changelog

  • [7af69da] fix(cli): remove stdin from description of --output-translations arg

Translations cannot be written to stdout because there are many files

  • [a18e311] Merge pull request #882 from nextstrain/fix/cli-help-translations

  • [2cba8ff] chore: release cli 2.0.0-beta.6

2.0.0-beta.5

20 Jun 21:38
Compare
Choose a tag to compare
2.0.0-beta.5 Pre-release
Pre-release
⚠️ This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases.

Commit history

(click to expand)
  • [f44cfc9] chore(ci): add docker-dev script for consideration for checksum

  • [bfd4773] chore(ci): tag pre-release images with version tag too

  • [696ea69] chore(ci): reset ci caches

  • [5a45b5a] chore(ci): fix debian 8 build by using clang 8

This is the last version available for debian 8 (jessie) on https://apt.llvm.org/

  • [1dc5992] chore(ci): fix arm linux gnu build

By using a more recent version of Debian base image

  • [44cb9bb] chore(ci): improve compatibility of linux gnu binaries further

Let's try to build on debian 7 (wheezy)

  • [5ba8e7e] chore: release cli 2.0.0-beta.5

2.0.0-beta.4

20 Jun 11:57
Compare
Choose a tag to compare
2.0.0-beta.4 Pre-release
Pre-release
⚠️ This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases.

Commit history

(click to expand)
  • [2e6f6ee] chore(ci): ensure even better compatibility of linux gnu binaries

Followup of #873

Let's downgrade base image for CI builds to Debian 8, for even better compatibility

  • [ad15870] Merge pull request #874 from nextstrain/chore/ci-improve-linux-gnu-compat-2

  • [a13be61] chore: release cli 2.0.0-beta.4

2.0.0-beta.3

20 Jun 08:40
Compare
Choose a tag to compare
2.0.0-beta.3 Pre-release
Pre-release
⚠️ This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases.

Commit history

(click to expand)
  • [0dcb8dd] chore(ci): ensure better compatibility of linux gnu binaries

Let's build aarch64-unknown-linux-gnu and x86_64-unknown-linux-gnu binaries on debian 9, so that it links older shared libs. This should allows running on wider spectrum of Linux distros with various versions of glibc and libgcc.

  • [110f484] chore(ci): add aarch64-unknown-linux-musl binaries

Let's add ARM linux musl binaries, and use musl gcc from the official musl website for both ARM and x86_64 builds for consistency

  • [afe3140] Merge pull request #873 from nextstrain/chore/ci-improve-linux-gnu-compat

  • [db1115f] chore: release cli 2.0.0-beta.3

2.0.0-beta.2

17 Jun 10:04
Compare
Choose a tag to compare
2.0.0-beta.2 Pre-release
Pre-release
⚠️ This is a pre-release. It can contain bugs and significant changes which are not yet finalized. Changes may appear without notice. We recommend to try the pre-releases to learn about upcoming features. For important projects, use stable releases.

See the changelog: https://github.com/nextstrain/nextclade/blob/master/CHANGELOG.md#nextclade-200

Commit history

(click to expand)
  • [1ec1c2b] feat(web): add warning for unsupported browsers

  • [c91d006] Merge pull request #866 from nextstrain/feat/web-unsipported-browser-warning

  • [0288991] chore: release web v2.1.0

  • [9d056b1] chore(ci): ensure correct full domain var is set for web app builds

  • [c2d1459] fix(web): ensure init errors are not hidden

Nextclade Web has been hiding some of the errors that occur during initialization. Notably, if dataset server is not reachable or dataset index fetch fails for any reason, then Nextclade would just show loading spinner indefinitely.

This PR ensures that the error is properly handled and that an error message is shown in these cases.

  • [36f0df7] Merge pull request #867 from nextstrain/fix/web-init-hidden-errors

  • [21c8568] feat(cli): improve error message when old, removed cli args are used

  • [b4f0559] Merge pull request #868 from nextstrain/feat/cli-better-errors-on-removed-args

  • [013eed6] chore: release cli 2.0.0-beta.2