This repository contains an implementation of the dynamic suffix array described in the paper A four-stage algorithm for updating a Burrows-Wheeler transform by Salson et al. It was implemented as a project assignment for our Bioinformatics course.
We are using Gerlach's implentation of Mäkinen and Navarro’s dynamic structures. This allows us to perform our rankc, insert and delete operations in O(log n log σ), where n is the length of the text and σ is the size of the alphabet Σ. These structures also include a sampler, which we are using for (more) efficient computation of ISA values at a given index.
The majority of our implementation is contained within the header file DynamicSuffixArray.h and its implementation DynamicSuffixArray.cpp. The class, variable and method comments are all contained within the header file. The comments inside the C++ file only explain the implementation details or reference the paper.
The rest of the C++ files are unchanged or mostly unchanged, having required at most a few (re)definitions to satisfy our compilers.
The building process is managed by the Makefile. It manages the building process, as well as the compiler options and flags. The building can simply be started by running
make
or
make all
For testing purposes we implemented unit tests using Catch automated test framework. Tests are grouped into categories that can be run independently. To start all tests use:
./test
To start all tests with time monitoring options use:
./test -d yes
To print all possible options use:
./test -?
Action | Length | Time[s] | Memory[MB] |
---|---|---|---|
Calculate BWT | 1031 | 0.003357 | 1 |
Calculate BWT | 5385 | 0.028451 | 1 |
Calculate BWT | 35970 | 0.176832 | 1 |
Calculate BWT | 189561 | 1.214295 | 2 |
Calculate BWT | 2494862 | 21.097746 | 9 |
Adding string | 512 into 1032 | 0.001728 | 1 |
Deleting string | 512 from 1032 | 0.001916 | 1 |
Deleting string | 512 from 35970 | 0.002748 | 2 |
Adding string | 1413 into 189561 | 0.010186 | 2 |
Deleting string | 42000 from 189561 | 0.0203301 | 2 |
Deleting string | 126183 from 700918 | 0.738056 | 4 |
Adding string | 126183 into 700918 | 0.902695 | 4 |