Skip to content

Releases: jltsiren/gbwt

GBWT v1.3.1

18 Feb 06:15
Compare
Choose a tag to compare

Minor patch release for the GBZ paper.

  • Empty paths are fully supported (but still discouraged).
  • Text input format for build_gbwt (mostly for testing).
  • The broken CMake support has been removed.

GBWT v1.3

15 Nov 19:29
Compare
Choose a tag to compare
  • Supports 64-bit ARM.
  • File format version 5:
    • Optional serialization using simple-sds structures.
    • Tags structure storing arbitrary key-value pairs.
    • Compatible with versions 1-4.
    • Uses Metadata version 2 (compatible with versions 0-1).
  • inverseLF(): Follow the sequence backward in a bidirectional index.
  • Serialization and loading use exceptions to handle failures.
  • Requires the vgteam fork of SDSL.

GBWT v1.2

23 Jan 04:06
Compare
Choose a tag to compare
  • Uses C++14 and the vgteam fork of SDSL.
  • Direct GBWT to DynamicGBWT conversion.
  • Temporary files are now thread-safe.
  • An option to use persistent phasing files for haplotype generation. These files persist when the associated object is deleted, but they are still deleted when the program exits.
  • The fast GBWT merging algorithm now works with overlapping node id ranges as long as the non-empty records do not overlap.
  • metadata_tool now prints metadata or removes it completely.

GBWT v1.1

14 Sep 23:25
Compare
Choose a tag to compare

Major new functionality: FastLocate. An add-on structure for the compressed GBWT implementing the r-index locate() algorithm. Larger than the existing locate() structure but also much faster. Must be rebuilt whenever the GBWT is changed.

Other improvements:

  • Metadata is ignored when merging empty GBWTs.
  • Faster construction when the paths contain many different starting nodes.

GBWT v1.0

06 Sep 01:10
Compare
Choose a tag to compare

Various minor improvements. The GBWT is now stable enough to reach v1.0.

  • Option to force the phasing of homozygous variants (default on).
  • CachedGBWT: A caching layer over GBWT for workloads that repeatedly access the same subset of nodes.
  • Direct DynamicGBWT to GBWT conversion.
  • Install script.

GBWT v0.9

12 Apr 22:14
Compare
Choose a tag to compare

Proper metadata: Each path (or a combination of a path and its reverse complement in a bidirectional index) has a name that consists of four integer components: sample, contig, phase, and count. Sample and contig ids may further have strings as names.

  • Extended metadata with path, sample, and contig names.
  • Sample names and contig name in VCF parse.
  • Create full metadata when building GBWT from a VCF parse using build_gbwt.
  • Renamed metadata to metadata_tool.
  • Remove sequences by sample / contig name in remove_seq.
  • New functionality: GBWT::firstNode(), GBWT::empty(node).

GBWT v0.8

11 Jan 20:36
Compare
Choose a tag to compare

Construction improvements. This version was used for the benchmarks in the full version of the paper.

  • An algorithm for removing sequences from DynamicGBWT.
  • Multiple parallel merge jobs in BWT-merge. If the temporary disk is fast enough, merging is roughly twice as fast as in v0.8.
  • build_gbwt improvements: Accept file lists, write metadata when building from VCF parse.

GBWT v0.7

22 Nov 05:06
Compare
Choose a tag to compare

Faster construction for datasets larger than 1000GP.

  • Parallel merging algorithm for quickly merging multiple GBWTs over the same chromosome. It can reduce the index construction time for large datasets by a factor of 2 to 3.
  • Optional metadata in the GBWT index.
  • New functionality: GBWT::extract(position), GBWT::extract(position, max_length), DynamicGBWT::fullLF().

GBWT v0.6

24 Sep 19:03
Compare
Choose a tag to compare

Various improvements to support building GBWT for larger datasets than 1000GP.

  • Option to change the path identifier sampling interval.
  • Save the temporary structures from haplotype generation and use them as input for build_gbwt.
  • Decompress the endmarker of compressed GBWT for faster extract() queries in indexes with millions of paths.
  • Bug fix: Initialize incoming edges correctly when loading DynamicGBWT if alphabet offset is non-zero.
  • Support for Clang.

Full decompression of the endmarker made changes to the index file format unnecessary at the moment. The changes will be made before v1.0, though.

GBWT v0.5

20 Jul 09:41
Compare
Choose a tag to compare

Major functionality update.

  • Support for bidirectional search.
    • Requires that each sequence has been indexed in both orientations.
    • Newly built indexes contain a flag that tells whether the index is known to support bidirectional search.
    • Old indexes still work, and the bidirectional search can also be used with them.
  • Bug fixes for empty indexes.
  • Save memory by using vector_type (32-bit integers) instead of std::vector<node_type> (64-bit integers).
  • Support structures for generating haplotypes from a phased VCF file.

The next release (v0.6) will introduce major changes to the indexes and the file formats. Indexes built with v0.5 and earlier will not work with v0.6.