Nanocall is an alternative, open source, MIT licensed, basecaller for Oxford Nanopore Technologies (ONT) sequencing data. Published in Bioinformatics, 2016.
For the official ONT basecaller, see Metrichor.
To understand the usefulness of Nanocall compared to Metrichor, some background is in order.
Before summer 2016: Before the summer of 2016, and the release of the R9 sequencing pore:
- Metrichor was the only available basecaller for ONT data.
- Metrichor’s source code was closed.
- Metrichor was only available as a cloud service.
This state of affairs prompted us to develop Nanocall as an open-source local basecaller alternative to Metrichor.
After summer 2016: The summer of 2016 has brought along several significant developments from ONT:
- A new sequencing pore R9 was released: (ONT Press Release, May 2016).
- The Metrichor source code was opened (under a development license).
- ONT provided an official option for local basecalling: (ONT Press Release, Aug 2016).
As a result, Nanocall’s usefulness is now limited to:
- a platform for developing new basecalling ideas, and
- situations where, for various reasons, you do not have access to the official ONT basecaller(s).
If you want to use Nanocall on R9 data, Nanocall does support it directly, but its accuracy is significantly lower than that of Metrichor (unlike the case of R7.3, where the two had similar accuracy). The reason for the discrepancy is that Metrichor on R9 uses a more elaborate RNN-based approach, compared to the simple HMM-based one in Nanocall.
Most people are only used to dealing with DNA bases. However, to understand where Nanocall fits in, we observe that there are 3 levels of ONT sequencing data:
- Raw samples. These are direct (picoamp) current measurements, taken at preset intervals as the DNA molecule is threaded through the pore. This data is passed through the USB cable from the MinION to the controlling laptop running MinKNOW. These are stored in
fast5
files at paths such as/Raw/Reads/Read_29/Signal
. - Events. Each event is an aggregation of multiple consecutive raw samples, (ideally) corresponding to a certain DNA context found in the pore. The process of computing events from raw samples is referred to as event detection. These are stored in
fast5
files at paths such as/Analyses/EventDetection_000/Reads/Read_29/Events
. - DNA bases. These are the usual, finished product. The process of computing DNA bases from events is referred to as basecalling. These are stored in
fast5
files at paths such as/Analyses/Basecall_2D_000/BaseCalled_2D/Fastq
.
On R7.3, event detection was performed locally by MinKNOW, and events were passed on to, and used by Metrichor. Since Nanocall was developed as an alternative local basecaller for R7.3 data, Nanocall is designed to work with events, not with raw samples.
On (at least some versions of) R9, Metrichor would entirely redo the event detection directly from raw samples, disregarding any event detection done locally by MinKNOW. As such, it is less uncommon with R9 (than with R7.3) to see fast5
files without events. Nanocall cannot be run directly on such files. To use Nanocall on R9 data, you must either configure MinKNOW to perform local event detection, or pass the files through Metrichor to use its event detection.
Nanocall can be built from source in a classical UNIX environment, or directly under Docker. The Docker build might run under Windows, though this is not tested.
Nanocall uses cmake
for configuration and make
for building. The prerequisites needed for building are zlib
and hdf5
. On UNIX systems, hdf5
can be optionally built as a submodule.
Example build:
mkdir /some/source/dir && cd /some/source/dir git clone --recursive https://github.com/mateidavid/nanocall.git cd nanocall mkdir build && cd build cmake ../src [-DCMAKE_INSTALL_PREFIX=/some/install/dir] [-DBUILD_HDF5=1] [-DHDF5_ROOT=/path/to/hdf5] make make install /some/install/dir/bin/nanocall --version
Notes:
- The default install prefix is
/usr/local
. - Setting
BUILD_HDF5
will causehdf5
to be downloaded and built as a submodule. - Setting
HDF5_ROOT
is only necessary if a copy ofhdf5
is installed in a non-standard location. This is not needed whenBUILD_HDF5
is used.
To avoid dealing with prerequisites, Nanocall can be conveniently built under Docker. The installation and configuration of Docker itself is outside of the scope of this document.
The simplest way to run Nanocall under Docker is:
docker build -t nanocall https://github.com/mateidavid/nanocall.git docker run --rm nanocall --version docker run --rm -u $(id -u):$(id -g) -v /path/to/data:/data nanocall -t 4 . >output.fa
Howver, there are several problems with this build:
- The docker image is “fat”, in that it contains all the build time dependencies of Nanocall, which are not needed at run time.
- Without using
-u
, the image will create files with a UID of 0 on the mounted volumes of the host. To remove them, you will have to usesudo rm
orsudo chown
. - The timezone inside the image might be different from the host. This might confuse programs which depend on comparing modification times, most notably
make
.
To alleviate the problems mentioned above, you can build a “slim” Docker image as follows:
git clone --recursive --depth 1 https://github.com/mateidavid/nanocall.git nanocall/script/build-slim-docker-image docker run --rm nanocall --version docker run --rm -v /path/to/data:/data nanocall -t 4 . >output.fa
# Check version nanocall --version # Check command line parameters nanocall --help # Run on single file, save output and log nanocall /path/to/file.fast5 >output.fa 2>log # Run on directory, using 24 threads, discard log nanocall -t 24 /path/to/data >output.fa 2>/dev/null # Run on file-of-file-names nanocall /path/to/files.fofn >output.fa # Run Docker build on directory, using 4 threads # Note: -u is not needed with the "slim" build docker run --rm -u $(id -u):$(id -g) -v /path/to/data:/data nanocall -t 4 . >output.fa
Released under the MIT license.