Exam happened in 2019-04-17.
Uses aggiedown and GitHub Actions for CI. Tagged versions are available in the Releases page.
Comparison of containment approaches using MinHash:
- CMash (containment minhash)
- mash screen
- smol (scaled minhash)
Regenerating results (after running the setup steps):
conda activate thesis
cd experiments/smol_gather && snakemake --use-conda
Scaled MinHash sizes (number of hashes) analysis across domains in Genbank.
Analyzing unique and shared hashes in an inverted index.
All processing and analysis scripts were performed using the conda environment specified in environment.yml
.
To build and activate this environment run:
conda env create --force --file environment.yml
conda activate thesis