plentyofbugs

Find your sequenced isolate a compatable reference genome based on location, interests, and mash distance.

This repository contains the pipeline to identity a suitable reference isolate based on comparing a mini assembly to a set of complete genomes via mash. For more information on using this with riboSeed, see this page on choosing reference genomes.

Installation

Manual Installation

Dependencies

plentyofbugs requires

skesa (or spades)
seqtk
mash

conda create pob skesa seqtk mash
conda activate pob
pip install plentyofbugs
plentyofbugs  -h

Running

Get example data

The data in the test_data directory was gathered using the test_data/get_data.sh script. It consists of 19 plasmids from E coli, and a test plasmid from which reads were generated using ART. https://www.niehs.nih.gov/research/resources/software/biostatistics/art/. The contigs in this diretory were made with SKESA.

Running on example data

# running with raw reads
plentyofbugs -g ./test_data/plasmids/  -f ./test_data/test_reads1.fq -o tmp
# running with an assembly instead of raw reads
plentyofbugs -g ./test_data/plasmids/  --assembly ./test_data/contigs.fasta -o tmp
# running on a new cast of E coli genomes to compare to, downloading the required genomes on the way
plentyofbugs -g ./new_comparison_e_coli/ -n 5  --assembly ./test_data/contigs.fasta -o tmp --genus_species "Escherichia coli"

Just want the download the genomes?

Plentyofbugs includes the genomes-getter as a standalone script: get_n_genomes.

get_n_genomes -o "Escherichia coli" -g tmpnewgenomes -n 4

Running via container

NOTE: To run the legacy version that used pyani, run a version older than 0.87 with Docker or singularity -- it will save yourself a lot of trouble!

Docker

docker run --rm -t -v  ${PWD}:/data/ nickp60/plentyofbugs:0.97 -f /data/test_reads1.fq --genus_species "Escherichia coli" -n 5 -o /data/results/

which is

docker run --rm -t -v  <current directory>:/data/ nickp60/plentyofbugs:0.97  -f /data/<name of F reads file> --genus_species "<bug of interest>" -n <max number of strains to compare with>  -o /data/<name for output folder>/

Singularity

singularity pull docker://nickp60/plentyofbugs:0.97
plentyofbugs:0.92.sing -f ./test_reads1.fq --genus_species "Escherichia coli" -n 5  -o ./results/

Under the hood

What it does:

Runs a mini assembly with skesa or SPAdes
Identify which complete genomes are available for the genus and species
optional Subset number of complete genomes
Download reference genomes
Indexes/sketches genomes
Calculate the Mash distance/ANI of the mini assembly to the database of reference strains
Report the closest reference genome and the distance/ANI

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
icon		icon
plentyofbugs		plentyofbugs
test_data		test_data
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plentyofbugs

Installation

Manual Installation

Dependencies

Running

Get example data

Running on example data

Just want the download the genomes?

Running via container

Docker

Singularity

Under the hood

What it does:

About

Releases 1

Packages

Languages

License

nickp60/plentyofbugs

Folders and files

Latest commit

History

Repository files navigation

plentyofbugs

Installation

Manual Installation

Dependencies

Running

Get example data

Running on example data

Just want the download the genomes?

Running via container

Docker

Singularity

Under the hood

What it does:

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages