GitHub - metagenlab/diag_pipelines: Pipelines dealing with high throughput sequencing data for microbiology diagnostic procedures. Documentation available at http://metagenlabdiag-pipelines.readthedocs.io/en/latest/ and Docker images at https://hub.docker.com/r/metagenlab/diag_pipelines/.

Routine procedures for diagnostic purposes using microbial genomics and metagenomics.

Workflows for epidemiology, anti-microbial resistance genotyping and virulence factors identification have been implemented using the Snakemake workflow management system with bioconda integration for software dependency. Docker images of main releases are available.

Dependencies

Docker

Install the CE version following these instructions for ubuntu. Also make sure you have created the docker group and that you can run docker without sudo following these instruction. If you can't have access to the internet when inside a Docker container, apply those changes.

docker run hello-world
docker pull metagenlab/diag_pipelines:latest
docker run -t --rm metagenlab/diag_pipelines:latest sh -c "ping www.google.com"

Our Docker image is fit for a user called pipeline_user whose UID is 1080. It is advised to create this user on your computer before using the Docker image to run your analysis.

sudo useradd -G docker,sudo -u 1080 pipeline_user
sudo mkdir /home/pipeline_user/
sudo chown pipeline_user -R /home/pipeline_user/
sudo passwd pipeline_user

Alternatively, you can run the Docker as root (--user root) but the created folders will belong to the root user of your computer.

General use

Once you have pulled the docker image on your computer:

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/full_pipeline.rules \
--use-conda --conda-prefix $conda_folder --configfile config.yaml'

Update the config file for your needs. If you have read files you want to analyse, they should be stored in the links folder from your current working directory.

Run one of the 4 workflows

The pipeline implement four main workflows.

Epidemiological analysis
Annotation of virulence factors
Annotation of resistance markers
Characterization of one or multiple strains

An html report summarizing results is generated upon completion of the workflow.

Epidemiological analysis

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/full_pipeline.rules\
--use-conda --conda-prefix $conda_folder --configfile config.yaml\
epidemiology'

This will perform quality checks, map reads against one or multiple reference genome, calculate pairwise number of SNPs and generate a minimum spanning tree. The reference genome can be one of the assembled genome or an assembly available on the NCBI website. The analysis can also be restricted to the core genome as defined by existing cgMLST schemes or by computing a custom core genome with help of parsnp (see documentation https://metagenlabdiag-pipelines.readthedocs.io/en/latest/core_genomes.html).

Annotation of virulence factors

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/full_pipeline.rules\
--use-conda --conda-prefix $conda_folder --configfile config.yaml\
virulence'

This will perform quality checks, assemble the genome and search for known virulence factors from the VFDB database.

Annotation of resistance markers

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/full_pipeline.rules\
--use-conda --conda-prefix $conda_folder --configfile config.yaml\
resistance'

This will perform quality checks, assemble the genome and search for known antibiotic resistance determinants with help of the rgi software and CARD database.

Characterization of one or multiple strains

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/full_pipeline.rules\
--use-conda --conda-prefix $conda_folder --configfile config.yaml\
strain_characterization'

This will perform quality checks, assemble the genome and search for known antibiotic resistance determinants with help of the rgi software and CARD database and search for known virulence factors from the VFDB database.

Generating specific files of interest

If you want to execute a specific analysis, you can request files of interest for a particular analysis. Consult the full documentation to know what files can be generated (http://metagenlabdiag-pipelines.readthedocs.io/en/latest/ ). Main examples are provided below:

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/full_pipeline.rules\
--use-conda --conda-prefix $conda_folder --configfile config.yaml\
report/multiqc_assembly/multiqc_report.html'

This will assemble and annotate every samples, and generate a multiqc report for all samples.

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/virulence.rules\
--use-conda --conda-prefix $conda_folder --configfile config.yaml\
virulence_summary.xlsx'

This will generate a summary excel file for the virulence factors of the samples, based on the virulence factors annotated in the file defined on the config file.

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/typing.rules\
--use-conda --conda-prefix $conda_folder --configfile config.yaml\
typing/freebayes_joint_genotyping/cgMLST/bwa/distances_in_snp.xlsx'

This will generate a snp-distance matrix of all samples, only on the core genome defined by ridom of the species defined in the species variable of the config file, mapped with bwa on the reference genome used by ridom (which is Staphylococcus aureus COL substrain, id 33148 from the NCBI Assembly database).

docker run -t --rm \
--mount source="$(pwd)",target=/home/pipeline_user/data/analysis/,type=bind \
metagenlab/diag_pipelines:latest \
sh -c 'snakemake --snakefile $pipeline_folder/workflows/resistance.rules\
--use-conda --conda-prefix $conda_folder --configfile config.yaml\
report/typing/mlst/summary.xlsx'

This will generate an Excel summary file of the MLST of all samples, based on the software mlst.

All Deliverables

Here is a list of all deliverables currently available:

Name		Name	Last commit message	Last commit date
Latest commit History 1,704 Commits
.circleci		.circleci
benchmark_validation		benchmark_validation
data		data
docs		docs
envs		envs
old_rules		old_rules
patches		patches
rules		rules
scripts		scripts
workflows		workflows
.dockerignore		.dockerignore
Dockerfile		Dockerfile
README.rst		README.rst
cluster.json		cluster.json
config.yaml		config.yaml
example_local_samples.tsv		example_local_samples.tsv
example_sra_samples.tsv		example_sra_samples.tsv
validation.Dockerfile		validation.Dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Docker

General use

Run one of the 4 workflows

Epidemiological analysis

Annotation of virulence factors

Annotation of resistance markers

Characterization of one or multiple strains

Generating specific files of interest

All Deliverables

About

Releases 19

Packages

Contributors 4

Languages

metagenlab/diag_pipelines

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Docker

General use

Run one of the 4 workflows

Epidemiological analysis

Annotation of virulence factors

Annotation of resistance markers

Characterization of one or multiple strains

Generating specific files of interest

All Deliverables

About

Topics

Resources

Stars

Watchers

Forks

Releases 19

Packages 0

Contributors 4

Languages

Packages