Skip to content

JamesRH/camp_binning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAMP Binning

Documentation Status

https://img.shields.io/badge/version-0.4.1-brightgreen

Overview

This module is designed to function as both a standalone MAG binning pipeline as well as a component of the larger CAP2/CAMP metagenome analysis pipeline. As such, it is both self-contained (ex. instructions included for the setup of a versioned environment, etc.), and seamlessly compatible with other CAMP modules (ex. ingests and spawns standardized input/output config files, etc.).

As far as the binning procedure goes, the design philosophy is just to replicate the functionality of MetaWRAP (one of the original ensemble binning methods) with i) better dependency conflict management and ii) improved integration with new binning algorithms.

Currently, the binning algorithms MetaBAT2, CONCOCT, VAMB, and MaxBin2 are wrapped along with the bin refinement tool DAS Tool.

Installation

  1. Clone repo from Github.
  2. Set up the conda environment (contains, Snakemake) using configs/conda/binning.yaml.
    • There are some compatibility issues that I haven't ironed out (bowtie2 due to RedHat's geriatric dependencies), so you may have to substitute in your own version.

3. The conda version of MaxBin2 doesn't seem to work, so the best way to add it to the module is to install it separately.

cd bin/
wget https://sourceforge.net/projects/maxbin2/files/latest/download
tar -xf download
spack load gcc@6.3.0 # This is only necessary for HPCs with extremely old gcc's
cd MaxBin-2.2.7/src
make
./autobuild_auxiliary
wget https://github.com/loneknightpy/idba/releases/download/1.1.3/idba-1.1.3.tar.gz
tar -xf idba-1.1.3.tar.gz
cd idba-1.1.3/
./configure --prefix=/home/lam4003/bin/MaxBin-2.2.7/auxiliary/idba-1.1.3 # IDBA-UD was not included in the auxiliary build
make
# Optional: Export or add the following to ~/.bashrc
export PATH=$PATH:/path/to/bin/MaxBin-2.2.7:/path/to/bin/MaxBin-2.2.7/auxiliary/FragGeneScan_1.30:/path/to/bin/MaxBin-2.2.7/auxiliary/hmmer-3.1b1/src:/path/to/bin/MaxBin-2.2.7/auxiliary/bowtie2-2.2.3:/path/to/bin/MaxBin-2.2.7/auxiliary/idba-1.1.3/bin
  1. Update the locations of the test datasets in samples.csv, and the relevant parameters in configs/parameters.yaml.
  2. Make sure the installed pipeline works correctly.
    • Note: VAMB generates >600MB in bin FastAs, which can be deleted immediately after running the test if storage space is an issue.' It is also not included in the sample output directory test_data/test_out/ for this reason.
::
# Create and activate conda environment cd camp_binning conda env create -f configs/conda/binning.yaml conda activate binning # Run tests on the included sample dataset python /path/to/camp_binning/workflow/binning.py test

Using the Module

Input: /path/to/samples.csv provided by the user.

Output: 1) An output config file summarizing the locations of 2) the MAGs generated by MetaBAT2, CONCOCT, and VAMB. See test_data/test_out.tar.gz for a sample output work directory.

  • /path/to/work/dir/binning/final_reports/samples.csv for ingestion by the next module (ex. quality-checking)
  • /path/to/work/dir/binning/*/sample_name/, where * is either 1_metabat2 or 2_concoct, the directories containing FastAs (*.fa) of MAGs inferred by MetaBAT2 and CONCOCT respectively

Structure:

└── workflow
    ├── Snakefile
    ├── binning.py
    ├── utils.py
    └── __init__.py
  • workflow/binning.py: Click-based CLI that wraps the snakemake and other commands for clean management of parameters, resources, and environment variables.
  • workflow/Snakefile: The snakemake pipeline.
  • workflow/utils.py: Sample ingestion and work directory setup functions, and other utility functions used in the pipeline and the CLI.
  1. Make your own samples.csv based on the template in configs/samples.csv.
    • ingest_samples in workflow/utils.py expects Illumina reads in FastQ (may be gzipped) form and de novo assembled contigs in FastA form
    • samples.csv requires either absolute paths or symlinks relative to the directory that the module is being run in
  2. Update the relevant metabat2, concoct, vamb, and maxbin2 parameters in configs/parameters.yaml.
  3. Update the computational resources available to the pipeline in resources.yaml.
  4. To run CAMP on the command line, use the following, where /path/to/work/dir is replaced with the absolute path of your chosen working directory, and /path/to/samples.csv is replaced with your copy of samples.csv.
    • The default number of cores available to Snakemake is 1 which is enough for test data, but should probably be adjusted to 10+ for a real dataset.
    • Relative or absolute paths to the Snakefile and/or the working directory (if you're running elsewhere) are accepted!
::
python /path/to/camp_binning/workflow/binning.py
(-c max_number_of_local_cpu_cores) -d /path/to/work/dir -s /path/to/samples.csv
  • Note: This setup allows the main Snakefile to live outside of the work directory.
  1. To run CAMP on a job submission cluster (for now, only Slurm is supported), use the following.
    • --slurm is an optional flag that submits all rules in the Snakemake pipeline as sbatch jobs.
    • In Slurm mode, the -c flag refers to the maximum number of sbatch jobs submitted in parallel, not the pool of cores available to run the jobs. Each job will request the number of cores specified by threads in configs/resources/slurm.yaml.
sbatch -J jobname -o jobname.log << "EOF"
#!/bin/bash
python /path/to/camp_binning/workflow/binning.py --slurm \
    (-c max_number_of_parallel_jobs_submitted) \
    -d /path/to/work/dir \
    -s /path/to/samples.csv
EOF

6. After checking over final_reports/ and making sure you have everything you need, you can delete all intermediate files to save space.

python /path/to/camp_binning/workflow/binning.py cleanup \
    -d /path/to/work/dir \
    -s /path/to/samples.csv

7. If for some reason the module keeps failing, CAMP can print a script containing all of the remaining commands that can be run manually.

python /path/to/camp_binning/workflow/binning.py --dry_run \
    -d /path/to/work/dir \
    -s /path/to/samples.csv > cmds.txt
python /path/to/camp_binning/workflow/binning.py commands cmds.txt

Updating the Module

What if you've customized some components of the module, but you still want to update the rest of the module with latest version of the standard CAMP? Just do the following from within the module's home directory:
  • The flag with the setting -X ours forces conflicting hunks to be auto-resolved cleanly by favoring the local (i.e.: your) version.
::
cd /path/to/camp_binning git pull -X ours

Extending the Module

We love to see it! This module was partially envisioned as a dependable, prepackaged sandbox for developers to test their shiny new tools in.

These instructions are meant for developers who have made a tool and want to integrate or demo its functionality as part of the standard binning workflow, or developers who want to integrate an existing tool.

  1. Write a module rule that wraps your tool and integrates its input and output into the pipeline.
    • This is a great Snakemake tutorial for writing basic Snakemake rules.
    • If you're adding new tools from an existing YAML, use conda env update --file configs/conda/existing.yaml --prune.
    • If you're using external scripts and resource files that i) cannot easily be integrated into either utils.py or parameters.yaml, and ii) are not as large as databases that would justify an externally stored download, add them to workflow/ext/ or workflow/ext/scripts/ and use rule external_rule as a template to wrap them.
  2. Update the make_config in workflow/Snakefile rule to check for your tool's output files. Update samples.csv to document its output if downstream modules/tools are meant to ingest it.
    • If you plan to integrate multiple tools into the module that serve the same purpose but with different input or output requirements (ex. for alignment, Minimap2 for Nanopore reads vs. Bowtie2 for Illumina reads), you can toggle between these different 'streams' by setting the final files expected by make_config using the example function workflow_mode.
    • Update the description of the samples.csv input fields in the CLI script workflow/binning.py.
  3. If applicable, update the default conda config using conda env export > config/conda/binning.yaml with your tool and its dependencies.
    • If there are dependency conflicts, make a new conda YAML under configs/conda and specify its usage in specific rules using the conda option (see first_rule for an example).
  4. Add your tool's installation and running instructions to the module documentation and (if applicable) add the repo to your Read the Docs account + turn on the Read the Docs service hook.
  5. Run the pipeline once through to make sure everything works using the test data in test_data/ if appropriate, or your own appropriately-sized test data.
    • Note: Python functions imported from utils.py into Snakefile should be debugged on the command-line first before being added to a rule because Snakemake doesn't port standard output/error well when using run:.

6. Increment the version number of the modular pipeline- patch for bug fixes (changes E), minor for substantial changes to the rules and/or workflow (changes C), and major only applies to major releases of the CAMP.

bump2version --current-version A.C.E patch
  1. If you want your tool integrated into the main CAP2/CAMP pipeline, send a pull request and we'll have a look at it ASAP!
    • Please make it clear what your tool intends to do by including a summary in the commit/pull request (ex. "Release X.Y.Z: Integration of tool A, which does B to C and outputs D").

Bugs

There is a dependency error that hasn't been addressed yet, namely bowtie2 in the main camp_binning conda environment, which has conflicting C++ and Perl dependencies with some other packages.

Credits

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.3%
  • Shell 3.7%