Nucleotide Archival Format (NAF)

NAF is a binary file format for biological sequence data. It's based on zstd, and features strong compression and fast decompression. It can store DNA, RNA, protein or text sequences, with or without qualities. It supports FASTA and FASTQ-formatted sequences, ambiguous IUPAC codes, masked sequence, and has no limit on sequence length or number of sequences. It supports Unix pipes which allows easy integration into pipelines. See NAF homepage for details.

Example benchmark: SILVA 132 LSURef database (610 MB):

From Sequence Compression Benchmark project - visit for details and more benchmarks.

More examples:

Format specification

NAF specification is in public domain: NAFv2.pdf

Encoder and decoder

NAF encoder and decoder are called "ennaf" and "unnaf". After compressing your data with ennaf, you suddenly have enough space. However, if you decompress it back with unnaf, your space is again un-enough.

Installing

Installing with bioconda

To install NAF with bioconda:

conda install naf

See package page for details: naf at bioconda.

Building from source

Prerequisites: git, gcc, make, diff, perl (diff and perl are only used for test suite). E.g., to install on Ubuntu: sudo apt install git gcc make diffutils perl. On Mac OS you may have to install Xcode Command Line Tools.

Building and installing:

git clone --recurse-submodules https://github.com/KirillKryukov/naf.git
cd naf && make && make test && sudo make install

To install in alternative location, add "prefix=DIR" to the "make install" command. E.g., sudo make prefix=/usr/local/bio install

For a staged install, add "DESTDIR=DIR". E.g., make DESTDIR=/tmp/stage install

On Windows it can be installed using Cygwin, and should be also possible with WSL. In Cygwin drop sudo: cd naf && make && make test && make install

Building from latest unreleased source

For testing purpose only:

git clone --recurse-submodules --branch develop https://github.com/KirillKryukov/naf.git
cd naf && make && make test && sudo make install

Compressing

ennaf file.fa -o file.naf

See ennaf -h and Compression Manual for detailed usage.

Decompressing

unnaf file.naf -o file.fa

See unnaf -h and Decompression Manual.

Compressing multiple files

Working with multiple files is possible using Multi-Multi-FASTA as intermediate format. Example commands:

Compressing:
mumu.pl --dir 'Helicobacter' 'Helicobacter pylori*' | ennaf -22 --text -o Hp.nafnaf

Decompressing and unpacking:
unnaf Hp.nafnaf | mumu.pl --unpack --dir 'Helicobacter'

Filename of NAF-compressed single file normally ends with a ".naf". To avoid ambiguity, ".nafnaf" is the recommended suffix for multi-file NAF archives.

Citation

If you use NAF, please cite:

Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi (2019) "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences" Bioinformatics, 35(19), 3826-3828, doi: 10.1093/bioinformatics/btz144.

For compressor benchmark, please cite:

Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi (2020) "Sequence Compression Benchmark (SCB) database — A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences" GigaScience, 9(7), giaa072, doi: 10.1093/gigascience/giaa072.

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
ennaf		ennaf
tests		tests
unnaf		unnaf
zstd @ a488ba1		zstd @ a488ba1
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
Compress.md		Compress.md
Decompress.md		Decompress.md
LICENSE		LICENSE
Makefile		Makefile
NAFv2.pdf		NAFv2.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nucleotide Archival Format (NAF)

Format specification

Encoder and decoder

Installing

Installing with bioconda

Building from source

Building from latest unreleased source

Compressing

Decompressing

Compressing multiple files

Citation

About

Releases 4

Packages

Languages

License

KirillKryukov/naf

Folders and files

Latest commit

History

Repository files navigation

Nucleotide Archival Format (NAF)

Format specification

Encoder and decoder

Installing

Installing with bioconda

Building from source

Building from latest unreleased source

Compressing

Decompressing

Compressing multiple files

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages