All notable changes to the khmer project will be documented in this file. See keepachangelog for more info.
The khmer project's command line scripts adhere to Semantic Versioning. The Python and C++ APIs are not yet under semantic versioning, but will be in future versions of khmer.
- Cython wrapper for liboxli.
- Cython containers for all liboxli classes.
- Header install for liboxli.
- New storage class using a Counting Quotient Filter with improved cache locality over bloom filters.
- New variants of the sequence bulk loading method with a "banding" mode and a
"mask" mode. In "banding" mode, only k-mers whose hashed values fall within a
specified range are counted. In "mask" mode, only k-mers not already pressent
in the specified mask are counted.
consume_seqfile_banding
consume_seqfile_with_mask
consume_seqfile_banding_with_mask
- Non-ACTG handling significantly changed so that only bulk-loading functions "clean" sequences of non-DNA characters. See #1590 for details.
- Split CPython wrapper file into per-class files under
src/khmer
andinclude/khmer
. - Moved liboxli headers to include/oxli and implementations to src/oxli.
- Removed all CPython wrappers except ReadParser and the standalone functions.
- Dropped support for Python 2.
- Changed to absolute imports.
- Some methods on LabelHash and Hashgraph have been changed to properties, or generators where appropriate.
- All constructors have been removed from khmer/init.py.
- GraphLabels does not inherit from Hashgraph.
trim-low-abund.py
doesn't error out when given multiple files with identical basenames
- Document for submission to the Journal of Open Source Software.
- Several typos and outdated content in the documentation.
- New
--no-reformat
option forinterleave-reads.py
script disables default read name correction behavior. - New
HashSet
data structure for managing collections of k-mer hashes and tags. - khmer package version now included in
.info
files. - New
-o|--outfile
option forfilter-abund-single.py
script. - New sandbox script
extract-compact-dbg.py
for computing a compact de Bruijn graph from sequence data. - New
--quiet
flag to several scripts, silencing diagnostic messages in terminal output. - Support for human-friendly memory requests (2G instead of 2000000000 or 2e9).
- Support for variable-coverage trimming in the
filter-abund-single.py
script. - Several simple examples of the Python API and the C++ API in
examples/python-api
andexamples/c++-api
, respectively. - New
assemble_linear_path
function for baiting unambiguous contigs with a seed k-mer from a hashtable. - Support for assembling directly from k-mer graphs, and a new JunctionCountAssembler class.
- Add --info flag for obtaining citation information.
- Added Counttable and Nodetable to support non-reversible hashing functionality and k > 32.
- Add a new storage class using half a byte per entry. Exposed as SmallCounttable and SmallCountgraph.
- Added
cleaned_seq
attribute tokhmer.Read
class which provides a cleaned version of the sequence of each read. - Added --summary-info to trim-low-abund.py to record run information in a file.
Nodetable
,Counttable
andSmallCounttable
use murmur hash 3 as hash function. This means they support kmers longer than 32 bases but means the hashes are not reversible.
- Suppress display of -x and -N command line options in script help messages.
- Switch from nose to py.test as the testing framework.
- Switch from internally managed Jenkins setup to Travis CI for continuous integration testing.
- Renamed core data structures: CountingHash --> Countgraph, Hashbits --> Nodegraph.
- Replaced the IParser and FastxParser classes with a single ReadParser class. Different input formats are supported by templating ReadParser with a reader class.
- Renamed
consume_fasta
and related functions toconsume_seqfile
, with support for reading sequences from additional formats pending. - Changed Sphinx documentation theme to "guzzle".
- Bug in compressed(gzip) streaming output from scripts
- The hashbits
update_from
function to correctly track occupied bins for calculating FPR. - Bug in the
filter-abund.py
script when--gzip
and-o
flags are used simultaneously. - Bug in the hashtable
get_kmers
function based on incorrect usage of thesubstr
function. - Bug in
broken_paired_reader
related to dropping short reads whenrequire_paired
is set. - Bug related to handling lowercase [acgtn] characters in input data.
- Bug in
load-graph.py
that calculated required graph space incorrectly. - Fix loading of empty partion map files
Previous to the khmer 2.1 release, all changes were documented in a file named
ChangeLog
. This file is now at legacy/ChangeLog
for posterity.