Canu v2.2
These are release notes for Canu version 2.2, which was released on August 26th, 2021. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology. (2018).
- Nurk S, Walenz BP, Rhiea A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Research. (2020).
Minimum Requirements
- 8GB minimum memory; 16GB strongly suggested
- GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.
Note that the installation directory has changed compared to previous releases.
To install from a binary distribution (recommended):
curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.<OX>-amd64.tar.xz --output canu-2.2.<OS>.tar.xz
tar -xJf canu-2.2.*.tar.xz
replacing <OX>
with Darwin or Linux, depending on your platform. Confirm the MD5 matches the expected value.
6bd937d31bb9f5f46bf0f9839889c00f canu-2.2.Darwin.tar.xz
63219165fc45b3dbbeb73ed920a23db5 canu-2.2.Linux.tar.xz
For recent versions of OS X (10.15+) you may an the error similar to: "sqStoreCreate" cannot be opened because the developer cannot be verified
. If this happens you can remove the quarantine flags from Canu
xattr -d com.apple.quarantine ./canu-2.2/bin/*
xattr -d com.apple.quarantine ./canu-2.2/lib/*
Canu will be installed at canu-2.2/bin/canu.
To install from source code (DO NOT download the Source code files provided by GitHub as these will not compile, use the canu-2.2.tar.gz instead):
curl -L https://github.com/marbl/canu/releases/download/v2.2/canu-2.2.tar.xz --output canu-2.2.tar.xz
tar -xJf canu-2.2.tar.xz
cd canu-2.2/src
make -j 8
cd ..
Canu will be installed at canu-2.2/build/bin/canu.
Changes
Canu v2.2 IS (expected to be) compatible with assemblies started with Canu v2.1 (and v2.1.1) but NOT with any earlier version. However, we DO NOT recommend mixing versions.
- Tweaks to Overlap Error Adjustment to identify real differences near heterozygous alleles, to ignore differences near read ends, and others, mostly for HiFi data. 1ac9dc3 through cb94432
- Tweaks to Overlap Based Trimming to use only evidence overlaps that have different spans; that is, overlaps that do not pile-up on themselves. e540977
- Read Correction:
- Decrease corErrorRate from 0.50 to 0.30 for Nanopore and from 0.30 to 0.25 for PacBio. For Nanopore data, this results in around a 2/3 reduction in 'falconsense' time. See https://canu.readthedocs.io/en/latest/parameter-reference.html#corerrorrate for details. 741911c
- Pass mhap output (*.mhap files) directly to mhapConvert (.ovb files) using a named pipe, instead of a large intermediate file. Option mhapPipe can be used to switch back to using intermediate files. 4fada27
- Do not convert or load short overlaps into the overlap store during correction. d6b7a1f and b982642
- Pass global filter coverage to generateCorrectionLayouts. When corOutCoverage is changed from the default 40x, the number of reads that can be used to correct another read changes correspondingly. e192966 and 07c0481
- Trim low-quality ends from read-to-template alignments before using them for generating corrected reads. ea2b03d
Bug Fixes
- Filter HiFi reads by their homopolymer compressed length. 258941d
- Show HiFi read length histograms using their uncompressed length. f1eadb3
- Fix crash trying to compute the error profile of unitigs with billions of overlaps. Issue #1355. 69e22c9
- Fix 'Assertion 'mincoord < maxcoord' failed' in findPotentialOrphans(). Issues #1872 and #1831. 2f73439
- Improve detectin of grid resources specified in environment variables. Issue #1912. 404540a
- Fix rare crash when placing reads in abnormally short tigs. 2b70735
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, especially for nanopore data, but may produce slightly less contiguous assemblies. - No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.