a basic workflow for running Nick Hathaway's seekdeep on illumina. This version splits up jobs into individual snakemake submissions.
- Install singularity: (my personal favorite installation method on Ubuntu is to download a .deb file from https://github.com/sylabs/singularity/releases) and use
sudo apt install /full/path/to/deb/file)
- Install mamba: https://github.com/conda-forge/miniforge#install (don't forget to do conda init and follow the instructions to log out and back in at the end)
- Create a mamba environment and install snakemake there:
mamba create -c conda-forge -c bioconda -n snakemake snakemake
mamba activate snakemake
- Change directory to a folder where you want to run the analysis
- clone this repository with:
git clone https://github.com/bailey-lab/seekdeep_illumina_snakemake.git
- change directory to the cloned repo (seekdeep_illumina_snakemake)
- download the elucidator.sif file and the tutorial dataset with:
bash download_example_dataset.sh
- Edit the seekdeep_illumina_general.yaml file using the instructions in the comments. Use a text editor that outputs unix line endings (e.g. vscode, notepad++, gedit, micro, emacs, vim, vi, etc.)
- If snakemake is not your active conda environment, activate snakemake with:
mamba activate snakemake
- Run all steps with (e.g. if you have 8 cores available on your machine):
snakemake -s setup_run.smk --cores 8
snakemake -s run_extractor.smk --cores 8
snakemake -s finish_process.smk --cores 8
- You can also run all steps with:
bash run_all_steps.sh
You can read Nick Hathaway's manual here: https://seekdeep.brown.edu/
If you're in the folder where you downloaded the elucidator.sif file, you can get help on any seekdeep command with:
singularity exec elucidator.sif SeekDeep [cmd] -h
- The first command gets info about the genome (genTargetInfoFromGenomes).
- The second command sets up an analysis run (setupTarAmpAnalysis).
- The third command runs 3 seekdeep programs (runAnalysis.sh, no help files).
Here are some example help commands to learn more about these commands:
- singularity exec elucidator.sif SeekDeep -h
- singularity exec elucidator.sif SeekDeep genTargetInfoFromGenomes -h
- singularity exec elucidator.sif SeekDeep setupTarAmpAnalysis -h
Each of these steps can be tweaked for sensitivity and specificity (via extra_ [step]_cmds at the bottom of the yaml file):
- The first command extracts amplicon reads (extractor)
- The second command clusters together similar reads (qluster)
- The third command processes clusters into haplotypes (processClusters)
Here are some example help commands to learn more about these programs:
- singularity exec elucidator.sif SeekDeep extractor -h
- singularity exec elucidator.sif SeekDeep qluster -h
- singularity exec elucidator.sif SeekDeep processClusters -h