KmerStream

Streaming algorithm for computing kmer statistics for massive genomics datasets.

Installation

To compile just type make

Running

To see the usage just type KmerStream

KmerStream 1.1

Estimates occurrences of k-mers in fastq or fasta files and saves results

Usage: KmerStream [options] ... FASTQ files

-k, --kmer-size=INT      Size of k-mers, either a single value or comma separated list
-q, --quality-cutoff=INT Comma separated list, keep k-mers with bases above quality threshold in PHRED (default 0)
-o, --output=STRING      Filename for output
-e, --error-rate=FLOAT   Error rate guaranteed (default value 0.01)
-t, --threads=INT        Number of threads to use (default value 1)
-s, --seed=INT           Seed value for the randomness (default value 0, use time based randomness)
-b, --bam                Input is in BAM format (default false)
    --binary             Output is written in binary format (default false)
    --tsv                Output is written in TSV format (default false)
    --verbose            Print lots of messages during run
    --online             Prints out estimates every 100K reads
    --q64                set if PHRED+64 scores are used (@...h) default used PHRED+33

Options:

-k the k-mer size, this should be an integer or a list of integers e.g. -k 31 or -k 31,47,63, odd values behave better than even values
-q optional quality cutoff values, all k-mers with bases under the q threshold are discarded
-o filename where the output should be written
-e guarantee on the error of the estimator used, default value is 1%, lower values increase memory usage
-t number of threads to use
-s KmerStream uses random hash functions for computing the statistics, to fix the hash value for reproducibility set the seed to a fixed value, e.g. '-s 42'
-b Input is in BAM format
--binary Write output in binary format, this includes the data necessary for running KmerStreamJoin, the output filename is used as a prefix and the file containing the output is PREFIX + _Q_0_k_31
--tsv Write output in TSV (tab separated values) format for easier parsing
--online prints estimates every 100K reads, see (https://pmelsted.wordpress.com/2014/07/12/analyzing-data-while-downloading/)[https://pmelsted.wordpress.com/2014/07/12/analyzing-data-while-downloading/] for example usage
--q64 Quality values are enchoded in PHRED+64 format rather than the default PHRED+33, use this if your quality values are from @ to h rather than ! to I

KmerStreamJoin

KmerStreamJoin 1.1

Creates union of many stream estimates

Usage: KmerStreamJoin -o output files ...
       KmerStreamJoin merged-file

-o, --output=STRING      Filename for output
    --verbose            Print output at the end

KmerStreamJoin, when run with the -o option takes a list of KmerStream binary output files (created with --binary option to KmerStream) and creates a single binary output file that is equivalent to having run a single KmerStream run on all of the files. When the -o option is missing it outputs the KmerStream result of the binary input file.

This utility is useful when distributing the process of creating the binary files or computed incrementally.

KmerStreamEstimate.py

KmerStreamEstimate is a python script that reads a tsv file as input (generated using --tsv) and estimates the genome size (G), error rate (e), and coverage (lambda).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
seqan		seqan
.gitignore		.gitignore
CountStream.cpp		CountStream.cpp
Kmer.cpp		Kmer.cpp
Kmer.hpp		Kmer.hpp
KmerIterator.cpp		KmerIterator.cpp
KmerIterator.hpp		KmerIterator.hpp
KmerStream.cpp		KmerStream.cpp
KmerStreamEstimate.py		KmerStreamEstimate.py
Makefile		Makefile
README.md		README.md
RepHash.cpp		RepHash.cpp
RepHash.hpp		RepHash.hpp
StreamCounter.hpp		StreamCounter.hpp
StreamJoin.cpp		StreamJoin.cpp
common.h		common.h
hash.cpp		hash.cpp
hash.hpp		hash.hpp
kseq.h		kseq.h
lsb.cpp		lsb.cpp
lsb.hpp		lsb.hpp
mersennetwister.h		mersennetwister.h
test_rephash.cpp		test_rephash.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KmerStream

Installation

Running

KmerStreamJoin

KmerStreamEstimate.py

About

Releases

Packages

Languages

pmelsted/KmerStream

Folders and files

Latest commit

History

Repository files navigation

KmerStream

Installation

Running

KmerStreamJoin

KmerStreamEstimate.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages