Skip to content

Releases: jltsiren/gbwtgraph

GBWTGraph v1.0

05 Dec 04:02
Compare
Choose a tag to compare

Things have stabilized enough to call this version 1.0. It contains a number of changes from the last year and a half.

  • Faster copying / moving / swapping of GBZ objects.
  • GFA compression/decompression changes:
    • L-line overlap is now written as 0M instead of *.
    • Option to extract GFA without using the node-to-segment translation.
  • Minimizer index version 9:
    • Payload is now 128 bits per hit (already in version 8).
    • Parameterized over both key type and value type, allowing the use minimizer indexes without a payload.
    • Option to provide a set of frequent kmers that should be avoided as minimizers, as in Winnowmap.
    • Plain kmer indexes can also be used.
    • Iterator over the kmers in the index.
    • Compatible with version 8 but not with earlier versions.
  • New graph operations:
    • subgraph(): Return a subgraph based on the given GBWT index (faster than using a constructor).
  • New algorithms:
    • gbwt_construction_jobs(): Partition a graph into sets of components for parallel GBWT construction.
    • partition_chains(): Partition top-level chains in a snarl decomposition between GBWT construction jobs.

GBWTGraph v0.9

13 May 21:47
Compare
Choose a tag to compare
  • Terminology change consistent with libhandlegraph senses:
    • Old reference paths are now generic (named) paths.
    • Some samples may be interpreted as reference paths.
    • All other paths are haplotype paths.
  • GFA compression/decompression changes:
    • The output is GFA 1.1 instead of GFA 1.0.
    • Path names are now interpreted as contig names instead of sample names by default.
    • Preset --pan-sn for parsing path names and extracting paths in the PanSN format.
    • --ref-only mode that extracts only graph topology and named paths as P-lines.
    • An optional list of reference samples can be stored using GFA header tag RS and GBWT tag reference_samples.
    • The API supports multiple regexes for parsing path names and assigning path senses by the first regex that matches the path name.
  • GBWTGraph implements the PathMetadata interface from libhandlegraph.
    • Metadata that depends on the sense (type) of the path.
    • Haplotype paths are not exposed through the PathHandleGraph interface.
  • Construction from HandleGraph takes an optional NamedNodeBackTranslation for building the node-to-segment translation.
  • Path cover / subsampled GBWTs now include (a selected subset of) paths from the original graph as generic paths.

GBWTGraph v0.8.1

18 Feb 06:32
Compare
Choose a tag to compare

This is a minor patch release for the GBZ paper.

  • Faster GFA decompression in some degenerate cases by caching large GBWT records.

GBWTGraph v0.8

21 Jan 16:34
Compare
Choose a tag to compare
  • GBWTGraph implements more libhandlegraph interfaces:
    • PathHandleGraph: Reference paths corresponding to sample name _gbwt_ref are reported as named paths.
    • NamedNodeBackTranslation: Node chopping + node id / segment name translation for GFA.
    • CachedGBWTGraph is only a HandleGraph.
  • Multithreaded GFA (de)compression in gfa2gbwt and with gfa_to_gbwt() / gbwt_to_gfa().
    • Faster and more memory-efficient compression, faster decompression.
    • Path order in the GBWT and path/edge order in the decompressed GFA file is different from the old algorithms.

GBWTGraph v0.7

20 Nov 02:21
Compare
Choose a tag to compare
  • File format version 3:
  • GBZ format for space-efficient storage of GFA with many paths.
    • Wrapper for GBWT and GBWTGraph serialized using the simple-sds format.
    • Better defined semantics.
  • Serialization and loading use exceptions to handle failures.
  • Requires version 2.3 of the vgteam fork of SDSL and GBWT version 1.3.
  • gfa2gbwt outputs the compressed simple-sds .gbz format by default.

GBWTGraph v0.6

17 Mar 20:20
Compare
Choose a tag to compare

Some work on using GBWTGraph for GFA storage.

  • Uses the vgteam fork of SDSL.
    • In CMake builds, uses the same SDSL as GBWT.
  • When a GFA file contains both P-lines and W-lines, use the P-lines as reference paths.
    • Reference paths have sample name _gbwt_ref and the path name as contig name.
  • GBWTGraph file format version 2.
    • Optional node-to-segment translation for GFA files.
    • Compatible with version 1.
  • Preliminary SegmentHandleGraph interface for GBWTGraph.
    • If node-to-segment translation is included in the graph, each handle maps to a (segment name, starting offset) pair.
    • Each node also maps to a (segment name, node id range) pair.
    • Iteration over segments and links.
  • Compressed GBWTGraph file format for GFA storage.
    • Both GBWT and GBWTGraph in the same file. GBWTGraph typically compresses to ~20% of the original space.
    • gfa2gbwt can output plain and compressed formats and convert between the two.
  • GFA extraction from GBWTGraph.

GBWTGraph v0.5

16 Jan 02:21
Compare
Choose a tag to compare

Major improvements to GFA parsing:

  • Multi-pass algorithm that validates the input before starting GBWT construction.
  • Segment names are automatically translated into integer ids if necessary.
  • Long segments are broken down into smaller nodes (at most 1024 bp by default) if necessary.
  • Segment-to-node translation table is written to basename.trans.
  • GBWT metadata can be generated by parsing path names.
  • Support for proposed W-lines.

GBWTGraph v0.4

06 Nov 05:38
Compare
Choose a tag to compare
  • CachedGBWTGraph: A GBWTGraph overlay that uses the cached interface automatically. Intended for algorithms that repeatedly access the edges in a small subgraph.
  • Minimizer index v7 (compatible with v6):
    • An option to use bounded syncmers instead of minimizers.
  • Graph algorithms in algorithms.h:
    • topological_order(): Find a topological order for all handles in the subgraph induced by a subset of nodes.
  • New queries:
    • GBWTGraph::cached_follow_edges(): A version of follow_edges() using CachedGBWT.
    • hits_in_subgraph(): Report minimizer hits in a subgraph induced by a set of nodes.
  • Changes to path cover:
    • local_haplotypes(): Revert to the path cover algorithm if there are no haplotypes in the component.
    • augment_gbwt(): Augment an existing GBWT with a path cover of missing components.

GBWTGraph v0.3

14 Apr 08:04
Compare
Choose a tag to compare
  • Implemented the new SerializableHandleGraph interface.
  • Query MinimizerIndex::count_and_find() that returns the occurrence count and a pointer to the internal representation of the occurrences.
  • Better path cover for acyclic component by always starting from a head node in such components.
  • Avoid querying for nonexistent nodes during construction when the source graph has gaps between node ids.
  • Store 64 bits of payload for each position in the minimizer index.
  • Minimizer index file format v6 (not compatible with the earlier versions).

GBWTGraph v0.2

08 Nov 21:24
Compare
Choose a tag to compare
  • The minimizer index is now parametrized with a key type. In addition to the old 64-bit keys, there is an option to use 128-bit keys that support up to 63 bp minimizers.
  • GBWT construction from a greedy (maximum) path cover of an arbitrary graph. This can be used to build a GBWTGraph when true haplotypes are not available.
  • A variant of the path cover algorithm that samples the local haplotypes. This can be used to build a GBWTGraph without rare haplotypes.