Skip to content

Canu v2.1

Compare
Choose a tag to compare
@brianwalenz brianwalenz released this 24 Aug 18:07
· 6 commits to v2.1-maintenance since this release

These are release notes for Canu version 2.1, which was released on August 21st, 2020. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.

Citation

Minimum Requirements

  • 8GB minimum memory; 16GB strongly suggested
  • GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
  • Perl 5.12.0, or File::Path 2.08
  • Java SE 8
  • macOS 10.10 Yosemite (for macOS/Darwin binaries only)
  • gnuplot 5.2 (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.

Note that the installation directory has changed compared to previous releases.

To install from a binary distribution (recommended):

tar -xJf canu-2.1.*.tar.xz

Canu will be installed at canu-2.1/bin/canu.

To install from source code (DO NOT download the Source code files provided by GitHub as these will not compile, use the canu-2.1.tar.gz instead):

gunzip -dc canu-2.1.tar.gz | tar -xf -
cd canu-2.1/src
make -j 8
cd ..

Canu will be installed at canu-2.1/build/bin/canu.

Changes

Canu v2.1 IS NOT compatible with assemblies started with any previous version.

  • Contigs are more correct, but generally smaller - better identification of bad reads, bubbles and ambiguous repeats.
    ** Avoid labeling true repeats as bubbles. Some contigs we previously flagged as bubbles are now flagged as repeats and are allowed to break contigs.
    ** Improve sensitivity of bubble detection. Some contigs we didn't flag before are now flagged as bubbles and will not break contigs.
    ** Break repeats at the read end suspected to be incorrectly assembled, instead of at the boundary of the repeat.
    ** Merge unambiguous small contigs into larger contigs correctly in tandem repeat regions.
  • Auto-increase maximum allowed overlap error when defaults are too restrictive. This applies to all datatypes but is particularly prevalent in HiFi datasets.
    ** Fix an esoteric error in picking the best overlap between a pair of reads that would sometimes fail to pick the longest overlap when all overlaps are at 100% identity.
  • Improve detection of circular contigs and output the coordinates of the non-redundant contig in the FASTA header line.
  • Add a report of the quality of overlaps used when building contigs to 'asm.report'.
  • Improve consensus quality in repetitive regions.
  • Remove support for having read files in spec files; it only worked in limited cases, and would be hard to fix.
  • Remove OSTYPE-MACHINETYPE (e.g., Linux-amd64) from the installation path. This quirk has been present since (almost) the first release of Celera Assembler. It was needed to support runs on a heterogeneous grid consisting of Intel 32-bit compute nodes (with 2 CPUs and 2 GB memory) and a "high memory" DEC Alpha node with 4 CPUs and 32 GB.
  • Change ovlStore file names to be POSIX compliant. Old names should be silently updated. Issue #1732.

Bug Fixes

  • Fix "Modification of non-creatable array value attempted" crash after "Meryl finished successfully." Issue #1632.
  • Fix crash in splitReads "Assertion w->clrBgn >= w->iniBgn failed." Issue #1655.
  • Fix failure running meryl-configure.sh on PBSPro. Issue #1740.
  • Fix underestimate of memory needed for consensus. Issue #1750.

Known Issues

See the issues page for up-to date open issues, or to report a problem.

  • Large memory usage and runtime for long reads (e.g., Nanopore) when using the overlapper=ovl algorithm, and during Overlap Error Adjustment. The -fast option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp.
  • No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.

See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.

Legal

Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.