Releases: jltsiren/gbwt
GBWT v1.3.1
GBWT v1.3
- Supports 64-bit ARM.
- File format version 5:
- Optional serialization using simple-sds structures.
Tags
structure storing arbitrary key-value pairs.- Compatible with versions 1-4.
- Uses
Metadata
version 2 (compatible with versions 0-1).
inverseLF()
: Follow the sequence backward in a bidirectional index.- Serialization and loading use exceptions to handle failures.
- Requires the vgteam fork of SDSL.
GBWT v1.2
- Uses C++14 and the vgteam fork of SDSL.
- Direct
GBWT
toDynamicGBWT
conversion. - Temporary files are now thread-safe.
- An option to use persistent phasing files for haplotype generation. These files persist when the associated object is deleted, but they are still deleted when the program exits.
- The fast GBWT merging algorithm now works with overlapping node id ranges as long as the non-empty records do not overlap.
metadata_tool
now prints metadata or removes it completely.
GBWT v1.1
Major new functionality: FastLocate
. An add-on structure for the compressed GBWT implementing the r-index locate()
algorithm. Larger than the existing locate()
structure but also much faster. Must be rebuilt whenever the GBWT is changed.
Other improvements:
- Metadata is ignored when merging empty GBWTs.
- Faster construction when the paths contain many different starting nodes.
GBWT v1.0
Various minor improvements. The GBWT is now stable enough to reach v1.0.
- Option to force the phasing of homozygous variants (default on).
CachedGBWT
: A caching layer overGBWT
for workloads that repeatedly access the same subset of nodes.- Direct
DynamicGBWT
toGBWT
conversion. - Install script.
GBWT v0.9
Proper metadata: Each path (or a combination of a path and its reverse complement in a bidirectional index) has a name that consists of four integer components: sample, contig, phase, and count. Sample and contig ids may further have strings as names.
- Extended metadata with path, sample, and contig names.
- Sample names and contig name in VCF parse.
- Create full metadata when building GBWT from a VCF parse using
build_gbwt
. - Renamed
metadata
tometadata_tool
. - Remove sequences by sample / contig name in
remove_seq
. - New functionality:
GBWT::firstNode()
,GBWT::empty(node)
.
GBWT v0.8
Construction improvements. This version was used for the benchmarks in the full version of the paper.
- An algorithm for removing sequences from
DynamicGBWT
. - Multiple parallel merge jobs in BWT-merge. If the temporary disk is fast enough, merging is roughly twice as fast as in v0.8.
build_gbwt
improvements: Accept file lists, write metadata when building from VCF parse.
GBWT v0.7
Faster construction for datasets larger than 1000GP.
- Parallel merging algorithm for quickly merging multiple GBWTs over the same chromosome. It can reduce the index construction time for large datasets by a factor of 2 to 3.
- Optional metadata in the GBWT index.
- New functionality:
GBWT::extract(position)
,GBWT::extract(position, max_length)
,DynamicGBWT::fullLF()
.
GBWT v0.6
Various improvements to support building GBWT for larger datasets than 1000GP.
- Option to change the path identifier sampling interval.
- Save the temporary structures from haplotype generation and use them as input for
build_gbwt
. - Decompress the endmarker of compressed GBWT for faster
extract()
queries in indexes with millions of paths. - Bug fix: Initialize incoming edges correctly when loading
DynamicGBWT
if alphabet offset is non-zero. - Support for Clang.
Full decompression of the endmarker made changes to the index file format unnecessary at the moment. The changes will be made before v1.0, though.
GBWT v0.5
Major functionality update.
- Support for bidirectional search.
- Requires that each sequence has been indexed in both orientations.
- Newly built indexes contain a flag that tells whether the index is known to support bidirectional search.
- Old indexes still work, and the bidirectional search can also be used with them.
- Bug fixes for empty indexes.
- Save memory by using
vector_type
(32-bit integers) instead ofstd::vector<node_type>
(64-bit integers). - Support structures for generating haplotypes from a phased VCF file.
The next release (v0.6) will introduce major changes to the indexes and the file formats. Indexes built with v0.5 and earlier will not work with v0.6.