Releases: jltsiren/gbwtgraph
Releases · jltsiren/gbwtgraph
GBWTGraph v1.0
Things have stabilized enough to call this version 1.0. It contains a number of changes from the last year and a half.
- Faster copying / moving / swapping of GBZ objects.
- GFA compression/decompression changes:
- L-line overlap is now written as
0M
instead of*
. - Option to extract GFA without using the node-to-segment translation.
- L-line overlap is now written as
- Minimizer index version 9:
- Payload is now 128 bits per hit (already in version 8).
- Parameterized over both key type and value type, allowing the use minimizer indexes without a payload.
- Option to provide a set of frequent kmers that should be avoided as minimizers, as in Winnowmap.
- Plain kmer indexes can also be used.
- Iterator over the kmers in the index.
- Compatible with version 8 but not with earlier versions.
- New graph operations:
subgraph()
: Return a subgraph based on the given GBWT index (faster than using a constructor).
- New algorithms:
gbwt_construction_jobs()
: Partition a graph into sets of components for parallel GBWT construction.partition_chains()
: Partition top-level chains in a snarl decomposition between GBWT construction jobs.
GBWTGraph v0.9
- Terminology change consistent with libhandlegraph senses:
- Old reference paths are now generic (named) paths.
- Some samples may be interpreted as reference paths.
- All other paths are haplotype paths.
- GFA compression/decompression changes:
- The output is GFA 1.1 instead of GFA 1.0.
- Path names are now interpreted as contig names instead of sample names by default.
- Preset
--pan-sn
for parsing path names and extracting paths in the PanSN format. --ref-only
mode that extracts only graph topology and named paths as P-lines.- An optional list of reference samples can be stored using GFA header tag
RS
and GBWT tagreference_samples
. - The API supports multiple regexes for parsing path names and assigning path senses by the first regex that matches the path name.
GBWTGraph
implements thePathMetadata
interface from libhandlegraph.- Metadata that depends on the sense (type) of the path.
- Haplotype paths are not exposed through the
PathHandleGraph
interface.
- Construction from
HandleGraph
takes an optionalNamedNodeBackTranslation
for building the node-to-segment translation. - Path cover / subsampled GBWTs now include (a selected subset of) paths from the original graph as generic paths.
GBWTGraph v0.8.1
This is a minor patch release for the GBZ paper.
- Faster GFA decompression in some degenerate cases by caching large GBWT records.
GBWTGraph v0.8
GBWTGraph
implements more libhandlegraph interfaces:PathHandleGraph
: Reference paths corresponding to sample name_gbwt_ref
are reported as named paths.NamedNodeBackTranslation
: Node chopping + node id / segment name translation for GFA.CachedGBWTGraph
is only aHandleGraph
.
- Multithreaded GFA (de)compression in
gfa2gbwt
and withgfa_to_gbwt()
/gbwt_to_gfa()
.- Faster and more memory-efficient compression, faster decompression.
- Path order in the GBWT and path/edge order in the decompressed GFA file is different from the old algorithms.
GBWTGraph v0.7
- File format version 3:
- Optional serialization using simple-sds structures.
- SDSL file format is compatible with versions 1-2.
- GBZ format for space-efficient storage of GFA with many paths.
- Wrapper for GBWT and GBWTGraph serialized using the simple-sds format.
- Better defined semantics.
- Serialization and loading use exceptions to handle failures.
- Requires version 2.3 of the vgteam fork of SDSL and GBWT version 1.3.
gfa2gbwt
outputs the compressed simple-sds.gbz
format by default.
GBWTGraph v0.6
Some work on using GBWTGraph for GFA storage.
- Uses the vgteam fork of SDSL.
- In CMake builds, uses the same SDSL as GBWT.
- When a GFA file contains both P-lines and W-lines, use the P-lines as reference paths.
- Reference paths have sample name
_gbwt_ref
and the path name as contig name.
- Reference paths have sample name
- GBWTGraph file format version 2.
- Optional node-to-segment translation for GFA files.
- Compatible with version 1.
- Preliminary
SegmentHandleGraph
interface forGBWTGraph
.- If node-to-segment translation is included in the graph, each handle maps to a (segment name, starting offset) pair.
- Each node also maps to a (segment name, node id range) pair.
- Iteration over segments and links.
- Compressed GBWTGraph file format for GFA storage.
- Both GBWT and GBWTGraph in the same file. GBWTGraph typically compresses to ~20% of the original space.
gfa2gbwt
can output plain and compressed formats and convert between the two.
- GFA extraction from GBWTGraph.
GBWTGraph v0.5
Major improvements to GFA parsing:
- Multi-pass algorithm that validates the input before starting GBWT construction.
- Segment names are automatically translated into integer ids if necessary.
- Long segments are broken down into smaller nodes (at most 1024 bp by default) if necessary.
- Segment-to-node translation table is written to
basename.trans
. - GBWT metadata can be generated by parsing path names.
- Support for proposed W-lines.
GBWTGraph v0.4
CachedGBWTGraph
: AGBWTGraph
overlay that uses the cached interface automatically. Intended for algorithms that repeatedly access the edges in a small subgraph.- Minimizer index v7 (compatible with v6):
- An option to use bounded syncmers instead of minimizers.
- Graph algorithms in
algorithms.h
:topological_order()
: Find a topological order for all handles in the subgraph induced by a subset of nodes.
- New queries:
GBWTGraph::cached_follow_edges()
: A version offollow_edges()
usingCachedGBWT
.hits_in_subgraph()
: Report minimizer hits in a subgraph induced by a set of nodes.
- Changes to path cover:
local_haplotypes()
: Revert to the path cover algorithm if there are no haplotypes in the component.augment_gbwt()
: Augment an existing GBWT with a path cover of missing components.
GBWTGraph v0.3
- Implemented the new
SerializableHandleGraph
interface. - Query
MinimizerIndex::count_and_find()
that returns the occurrence count and a pointer to the internal representation of the occurrences. - Better path cover for acyclic component by always starting from a head node in such components.
- Avoid querying for nonexistent nodes during construction when the source graph has gaps between node ids.
- Store 64 bits of payload for each position in the minimizer index.
- Minimizer index file format v6 (not compatible with the earlier versions).
GBWTGraph v0.2
- The minimizer index is now parametrized with a key type. In addition to the old 64-bit keys, there is an option to use 128-bit keys that support up to 63 bp minimizers.
- GBWT construction from a greedy (maximum) path cover of an arbitrary graph. This can be used to build a GBWTGraph when true haplotypes are not available.
- A variant of the path cover algorithm that samples the local haplotypes. This can be used to build a GBWTGraph without rare haplotypes.