A*PA is a global pairwise sequence aligner for edit distance using A*, co-authored by @pesho-ivanov and @RagnarGrootKoerkamp.
A*PA2 is an improvement of A*PA that uses a DP-based approach instead of plain A*. It achieves up to 20x speedup over other exact aligners and is competitive with approximate aligners.
An alignment of two sequences of length 500 with 30% error rate using A*PA:
An alignment of two sequences of length 10’000 with 15% error rate using A*PA2:
- Papers
-
For A*PA, we recommend reading the bioRxiv version which directly includes the
supplement and has better formatting. But please cite the Bioinformatics version!
- A*PA BioRxiv preprint:
Ragnar Groot Koerkamp, Pesho Ivanov. “Exact global alignment using A* with chaining seed heuristic and match pruning”. bioRxiv (2024). 10.1101/2022.09.19.508631
- A*PA OUP Bioinformatics paper:
Ragnar Groot Koerkamp, Pesho Ivanov. “Exact global alignment using A* with chaining seed heuristic and match pruning”. Bioinformatics (2024). 10.1093/bioinformatics/btae032
- A*PA2 BioRxiv preprint:
Ragnar Groot Koerkamp. “A*PA2: up to 20 times faster exact global alignment”. bioRxiv (2024). 10.1101/2024.03.24.586481
- A*PA BioRxiv preprint:
- Links
-
- Twitter: @curious_coding, @peshotrie,
- Matrix:
@curious_coding:matrix.org
, - Blog: https://curiouscoding.nl
If you run into any kind of problem or unclarity, please (please 🥺) make an issue or reach out on twitter or matrix.
To call A*PA2 from another Rust crate, simply add the astarpa[2]
crate in this
repo as a git dependency.
For A*PA2, use astarpa2_simple(a, b)
or astarpa2_full(a, b)
in the
~astarpa2~ crate, or customize parameters with e.g.
let mut params = astarpa2::AstarPa2Params::full();
params.front.incremental_doubling = false;
let mut aligner = params.make_aligner(true);
let (cost, cigar) = aligner.align(a, b);
The astarpa
crate is the main entrypoint for A*PA. See the docs there.
Use astarpa::astarpa(a, b)
for alignment with default settings or
astarpa::astarpa_gcsh(a,b,r,k,end_pruning)
to use GCSH+DT with custom parameters.
More complex usage examples can be found in pa-bin/examples.
The astarpa-c
crate contains simple C-bindings for the
astarpa::{astarpa,astarpa_gcsh}
and astarpa2::astarpa2_{simple,full}
functions and an example with makefile. More should not be needed for
simple usage. To run the resulting binary, make sure to export LD_LIBRARY_PATH=/path/to/astarpa/target/release
.
pa-bin
is a small command line application that takes as input consecutive pairs of
sequences from a .fasta
, .seq
, or .txt
file (or can generate random input)
and outputs costs and alignments to a .csv
.
This requires cargo
and Rust nightly
. To get both, first install rustup. Then enable nightly
: rustup install nightly; rustup default nightly
.
Install pa-bin
to ~/.local/share/cargo/bin/pa-bin
using the following (cloning this repo is not needed):
cargo install --git https://github.com/RagnarGrootKoerkamp/astar-pairwise-aligner pa-bin
To run from the repository: clone and ~cargo run --release -- <pa-bin flags>~.
cargo run --release -- -h
The Rust API supports generating visualizations using the sdl2
library and
ttf
fonts. If this gives errors, install sdl2
: e.g. using apt-get install libsdl2-ttf-dev
.
Here are some sample videos. The first five correspond to figure 1 of the A*PA paper. Timings are not comparable due to differences in visualization strategies (cell vs layer updates).
Dijkstra | Ukkonen’s exponential search (Edlib) |
Diagonal transition (WFA) | DT + Divide & Conquer (BiWFA) |
A*PA (GCSH+DT) | A*PA2-full (8-bit words; block size 32) |
- Figures
- Paper figures are generated using the example binaries at pa-bin/examples/astarpa-figures and pa-bin/examples/astarpa2-figures.
- Evals
- Benchmarking code, evals, and datasets can be found in the pa-bench repo. For A*PA, results can be found in this notebook and reproduced using this makefile. For A*PA2, results can be found in this notebook and reproduced using this justfile. Dataset downloads are in this release.
- Tests
-
Code is tested for correctness in various tests (astarpa/src/tests.rs) against
triple-accel
. The benchmark tool pa-bench also checks correctness automatically.
Code is spread out over multiple crates. From low to high:
pa-types
: Basic types such asSeq
,Pos
,Cigar
, andCost
, hosted in thepairwise-alignment
org.pa-affine-types
: Types for affine edit graphs such asState = (Pos, Layer)
,AffineCigar
, andCostModel
. Not used by A*PA, but other algorithms and the visualizer support it.pa-heuristic
: Code for- finding matches
- computing contours (fast and bruteforce)
- heuristics themselves
- wrapper/bruteforce heuristics for debugging
pa-vis-types
: Trait definition of the visualizer callbacks, and the emptyNoVis
visualizer.astarpa
: Main A*PA API entrypoint containing theastar
andastar_dt
functions, thebucket_queue
data structure, and theastarpa(a,b)
entrypoint.astarpa-c
: C-bindings forastarpa
pa-vis
: The visualizer. Contains aCanvas
trait implemented for theSDL2Canvas
. Thesdl2
feature is optional.pa-generate
: Library and binary to generate different types of random sequences.pa-bin
: Main command line interface to A*PA. Allows for input from file, generated input, visualizing, and customization of the A*PA parameters.pa-bitpacking
: Implementation of Myers’ bitpacking algorithms and SIMD extensions.astarpa2
: A*PA2 entrypoint containingastarpa2_simple
andastarpa2_full
functions.pa-base-algos
: Re-implementations of Needleman-Wunsch/Edlib and Diagonal-transition/WFA/BiWFA for visualizations.astarpa-next
: Some code for other new ideas such as path-pruning.pa-web
: web-interface to A*PA by compiling to webassembly. Implements theCanvas
trait forHTMLCanvas
. (Not maintained.)
MPL-2.0