Skip to content

Canu v1.4

Compare
Choose a tag to compare
@brianwalenz brianwalenz released this 13 Dec 21:18
· 2494 commits to master since this release

These are release notes for Canu version 1.4, which was released on December 13, 2016. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.

This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need (even the Perl modules!) to create a binary distribution for your own specific OS.

Citation

Minimum Requirements

  • Perl 5.12.0, or File::Path 2.08
  • Java SE 8
  • GCC 4.5 (for compilation only)
  • OS X 10.10 (for binaries only)
  • Gnuplot (optional, for generating diagnostic graphs)

Installation

Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.

To install from source code:

gunzip -dc v1.4.tar.gz |tar -xf -
cd canu-1.4/src
make -j8
cd ..

To install from a binary distribution:

xz -dc canu-1.4.*.tar.xz |tar -xf -

In both cases, canu is installed directory in canu-1.4/-, for example, canu-1.4/Linux-amd64. You can run the assembler with:

canu-1.4/*/bin/canu

Changes

  • Removed dependency on Filesys::Df.
  • Reduced size of overlap stores by 33 1/3%.
  • Added inline Snappy compression overlaps, instead of a separate gzip process. This greatly reduces resources required for building large overlap stores.
  • Memory mapped files are no longer used. Performance on distributed file systems should be improved. Virtual memory usage is greatly reduced.
  • Fixed a variety of issues in GFA output on unitigs, and added GFA output on contigs.
  • Added options onSuccess and onFailure to run a command when Canu terminates successfully or fails unexpectedly.
  • Added support for PBSPro.
  • Fixed the usual assortment of random bugs.
  • Added other minor improvements.

Known Issues

See the issues page for up-to date open issues. The currently known issues are:

  • For AT/GC rich eukaryotic genomes, it is beneficial to increase the filtering stringency over the default. Specifying corMaxEvidenceErate=0.15 (from the default of 0.2) is generally sufficient.
  • As a computational optimization, you can decrease the error rate (errorRate=0.013), especially for inbred strains, on Oxford Nanopore R9 2D data and high-coverage P6 PacBio data.
  • LSF support has limited testing
  • Large memory usage while unitig consensus calling on unitigs over 100MB in size (140Mb contig uses approximate 75GB).
  • Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.

Legal

As Canu is derived from the Celera Assembler, most of the code is GPL licensed. This distribution includes code from Boost, pbdagcon, pbutgcns, and Falcon. For a copyright summary see the README.licenses file as well as individual component licenses included in the repository (boost, falcon, pbdagcon). For more details, see the header in each source file which details its history.