Skip to content

Constructing a BIGSI

Zhicheng Liu edited this page Nov 5, 2021 · 4 revisions

1. Extract k-mers from your data

You can use just any tool you want to extract unique k-mers from your raw data. We recommend mccortex as you can use it's error cleaning methods to extract error cleaned k-mers. However, you can also use a k-mer counter software like Jellyfish or a custom script.

mccortex/bin/mccortex31 build -k 31 -s test1 -1 example-data/kmers.txt example-data/test1.ctx
mccortex/bin/mccortex31 build -k 31 -s test2 -1 example-data/kmers.txt example-data/test2.ctx

2. Create the BIGSI config files

Below are three example configs to get you started with your preferred key value store berkeleyDB, rocksDB, or redis. See https://github.com/iqbal-lab-org/BIGSI/wiki/Choosing-BIGSI-Parameters to decide on parameters k, m and h.

berkeleydb.yaml

## Example config using berkeleyDB
h: 1
k: 31
m: 28000000
storage-engine: berkeleydb
storage-config:
  filename: test-berkeleydb
  flag: "c" ## Change to 'r' for read-only access

redis.yaml

## Example config using redis
h: 1
k: 31
m: 28000000
storage-engine: redis
storage-config:
  host: localhost
  port: 6379

rocksdb.yaml

## Example config using rocksdb
h: 1
k: 31
m: 28000000
nproc: 4
storage-engine: rocksdb
storage-config:
  filename: test-rocksdb
  options:
    create_if_missing: true
    max_open_files: 5000
  read_only: false ## Change to true for read only access

3. Construct the bloom filters

export BIGSI_CONFIG=example-data/configs/berkeleydb.yaml ## set the config path, or use --config

bigsi bloom example-data/test1.ctx example-data/test1.bloom
bigsi bloom example-data/test2.ctx example-data/test2.bloom

4. Insert the bloom filters into the index

bigsi build -b example-data/test1.bloom -b example-data/test2.bloom -s s1 -s s2

5. Query the index

bigsi search CGGCGAGGAAGCGTTAAATCTCTTTCTGACG

If you've installed with docker

docker run -v $PWD:/data iqballab/bigsi:latest bigsi bloom --config example-data/configs/berkeleydb.yaml example-data/test1.ctx /data/test1.bloom
docker run -v $PWD:/data iqballab/bigsi:latest bigsi bloom --config example-data/configs/berkeleydb.yaml example-data/test2.ctx /data/test2.bloom
docker run -v $PWD:/data iqballab/bigsi:latest bigsi build --config example-data/configs/berkeleydb.yaml -b /data/test1.bloom -b /data/test2.bloom -s s1 -s s2
docker run -v $PWD:/data iqballab/bigsi:latest bigsi search --config example-data/configs/berkeleydb.yaml CGGCGAGGAAGCGTTAAATCTCTTTCTGACG