diff --git a/README.md b/README.md index 7d200a2..7aa9702 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,13 @@ +![](https://img.shields.io/conda/dn/bioconda/mob_suite) +![](https://img.shields.io/docker/pulls/kbessonov/mob_suite) +![](https://img.shields.io/pypi/dm/mob-suite) +![](https://img.shields.io/github/v/release/phac-nml/mob-suite?include_prereleases) +![](https://img.shields.io/github/last-commit/phac-nml/mob-suite) +![](https://img.shields.io/github/issues/phac-nml/mob-suite) + # MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies -## Introduction ## +## Introduction Plasmids are mobile genetic elements (MGEs), which allow for rapid evolution and adaption of bacteria to new niches through horizontal transmission of novel traits to different genetic backgrounds. The MOB-suite is designed to be a modular set of tools for the typing and @@ -8,15 +15,12 @@ reconstruction of plasmid sequences from WGS assemblies. The MOB-suite depends on a series of databases which are too large to be hosted in git-hub. They can be downloaded or updated by running mob_init or if running any of the tools for the first time, the databases will download and initialize automatically if you do not specify an alternate database location. However, they are quite large so the first run will take a long time depending on your connection and speed of your computer. -Databases can be manually downloaded from https://share.corefacility.ca/index.php/s/rYaAH7oxrSVtilN/download or https://zenodo.org/record/3786915/files/data.tar.gz?download=1.
-Our new automatic chromosome depletion feature in MOB-recon can be based on any collection of closed chromosome sequences but we have a prebuilt database available here: https://share.corefacility.ca/index.php/s/GJOgxxtbhWoX8fV/download +Databases can be manually downloaded from [here](https://share.corefacility.ca/index.php/s/rYaAH7oxrSVtilN/download) or [here](https://zenodo.org/record/3786915/files/data.tar.gz?download=1).
+Our new automatic chromosome depletion feature in MOB-recon can be based on any collection of closed chromosome sequences but we have a prebuilt database available [here](https://share.corefacility.ca/index.php/s/GJOgxxtbhWoX8fV/download). ### MOB-init -On first run of MOB-typer or MOB-recon, MOB-init should run to download the databases from figshare, sketch the databases and setup the blast databases. However, it can be run manually if the databases need to be re-initialized OR if you want to initialize the databases in an alternative directory. +On first run of MOB-typer or MOB-recon, MOB-init (invoked by `mob_init` command) should run to download the databases from figshare, sketch the databases and setup the blast databases. However, it can be run manually if the databases need to be re-initialized OR if you want to initialize the databases in an alternative directory. -``` -% mob_init -``` ### MOB-cluster This tool creates plasmid similarity groups using fast genomic distance estimation using Mash. Plasmids are grouped into clusters using complete-linkage clustering and the cluster code accessions provided by the tool provide an approximation of operational taxonomic units OTU’s. The plasmid nomenclature is designed to group highly similar plasmids together which are unlikely to have multiple representatives within a single cell and have a strong concordance with replicon and relaxase typing but is universally applicable since it uses the complete sequence of the plasmid itself rather than specific biomarkers. @@ -30,8 +34,9 @@ Provides in silico predictions of the replicon family, relaxase type, mate-pair ## Installation ## ## Requires -+ Python v. 3.7 + -+ ete3 >= 3 ++ Python >= 3.7 ++ ete3 >= 3.1.2 ++ pandas >=0.22.0,<=1.05 + biopython >= 1.70 + pytables >= 3.3 + pycurl >= 7.43 @@ -62,21 +67,20 @@ We recommend installing MOB-Suite via bioconda but you can install it via pip us ``` % pip3 install mob_suite - ``` ### Docker image A docker image is also available at [https://hub.docker.com/r/kbessonov/mob_suite](https://hub.docker.com/r/kbessonov/mob_suite) -``` -% docker pull kbessonov/mob_suite:2.0.0 -% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:2.0.0 " mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output +``` +% docker pull kbessonov/mob_suite:3.0.1 +% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:3.0.1" mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output ``` ### Singularity image A singularity image could be built via singularity recipe donated by Eric Deveaud. The recipe (`recipe.singularity`) is located in the singularity folder of this repository. -The docker image section also has instructions on how to create singularity image from a docker image. +The docker image [README section](https://hub.docker.com/repository/docker/kbessonov/mob_suite) also has instructions on how to create singularity image from a docker image. ```bash % singularity build mobsuite.simg recipe.singularity @@ -104,7 +108,6 @@ You can perform plasmid typing using a fasta formated file containing a single p # Multiple independant plasmids % mob_typer --multi --infile assembly.fasta --out_file sample_mobtyper_results.txt - ``` ## Using MOB-recon to reconstruct plasmids from draft assemblies @@ -120,12 +123,13 @@ As of v. 3.0.0, we have added the ability of users to provide their own specific ``` ### User sequence mask -% mob_recon --infile assembly.fasta --outdir my_out_dir -- +% mob_recon --infile assembly.fasta --outdir my_out_dir --filter_db filter.fasta ``` As of v. 3.0.0, we have provided the ability to use a collection of closed genomes which will be quickly checked using Mash for genomes which are genetically close and limit blast searches to those chromosomes. This more nuanced and automatic approach is recommended for users where there are sequences which should be filtered in one genomic context but not another. We provide as an optional download as set of closed Enterobacteriacea genomes from NCBI which can be used to provide added accuracy for some organisms such as E. coli and Klebsiella where there are sequences which switch between chromosome and plasmids.

If reconstructed plasmids exceed the Mash distance for primary cluster assignment, then they will get assigned a name in the format novel_{md5} where the md5 hash is calculated based on all of the sequences belonging to that reconstructed plasmid. This will provide a unique name for them but any change will result in a changed in the md5 hash. It is inadvised to use these groups for further analyses. Rather they should be highlighted as cases where targeted long read sequencing is required to obtain a closer database representitive of that plasmid. + ``` ### Autodetected close genome filter % mob_recon --infile assembly.fasta --outdir my_out_dir -g 2019-11-NCBI-Enterobacteriacea-Chromosomes.fasta diff --git a/mob_suite/blast/__init__.py b/mob_suite/blast/__init__.py index 3d5c6c6..f4c3d61 100644 --- a/mob_suite/blast/__init__.py +++ b/mob_suite/blast/__init__.py @@ -6,7 +6,7 @@ import os import pandas as pd -from pandas.io.common import EmptyDataError +from pandas.errors import EmptyDataError diff --git a/mob_suite/conda/meta.yaml b/mob_suite/conda/meta.yaml index 2296b86..b92cfd2 100644 --- a/mob_suite/conda/meta.yaml +++ b/mob_suite/conda/meta.yaml @@ -1,4 +1,4 @@ -{% set version = "3.0.0" %} +{% set version = "3.0.1" %} package: name: mob_suite @@ -10,27 +10,27 @@ build: script: python -m pip install --no-deps --ignore-installed . source: - #path: /root/mob_suite/mob-suite + path: /root/mob_suite/mob-suite #url: https://github.com/phac-nml/mob-suite/archive/{{ version }}.tar.gz #sha256: 221dc24eb6d98b119c25cabff5110709cd345790d9836cf5865bec9262fddc3f - git_url: https://github.com/phac-nml/mob-suite.git - git_rev: master + #git_url: https://github.com/phac-nml/mob-suite.git + #git_rev: master requirements: host: - python >=3.7 - pip run: - - python >=3.7 - - numpy >=1.11.1 - - pytables >=3.3 - - pandas >=0.22.0 - - biopython >=1.70 - - pycurl >=7.43 - - scipy >=1.1 - - ete3 >=3.0 - - blast >= 2.9.0 - - mash >= 2.2.2 + - python >=3.7,<4 + - numpy >=1.11.1,<2 + - pytables >=3.3,<4 + - pandas >=0.22.0,<=1.0.5 + - biopython >=1.70,<2 + - pycurl >=7.43,<8 + - scipy >=1.1,<2 + - ete3 >=3.0,<4 + - blast >=2.9.0,<3 + - mash >=2.2.2,<3 test: diff --git a/mob_suite/wrappers/mob_recon.xml b/mob_suite/wrappers/mob_recon.xml index 6cbd3a7..c3351a4 100644 --- a/mob_suite/wrappers/mob_recon.xml +++ b/mob_suite/wrappers/mob_recon.xml @@ -1,27 +1,62 @@ - + Type contigs and extract plasmid sequences - mob_suite - + mob_suite + + mob_recon --version /dev/null || true) + + --min_rep_cov '${adv_param.min_rep_cov}' + --min_mob_cov '${adv_param.min_mob_cov}' + --min_con_cov '${adv_param.min_con_cov}' + --min_rpp_cov '${adv_param.min_rpp_cov}' + --outdir 'outdir' && + mkdir ./outdir/plasmids && (mv outdir/plasmid*.fasta ./outdir/plasmids 2> /dev/null || true) ]]>
- - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + +
- - - - - + + + + + - -
- - -
- - - - - + +
+ + +
+ + + + + + + + + + + + + + +
@@ -96,7 +147,7 @@ For more information please visit https://github.com/phac-nml/mob-suite/. **Workflow** -This preliminary \"Mobilome and Resistome Analysis Workflow\" linking mob_recon with staramr provides reports on mobilome and resistome for a given isolate given a draft genome assembly. The workflow is located in Shared Data --> Workflows --> Mobilome and Resistome Analysis Workflow (MOB-Recon and STARAMR). The workflow file can also be mamanually downloaded from https://raw.githubusercontent.com/phac-nml/galaxy_tools/master/tools/mob_suite/workflows/AMRworkflow_STARAMR.ga. +This preliminary \"Mobilome and Resistome Analysis Workflow\" linking mob_recon with staramr provides reports on mobilome and resistome for a given isolate given a draft genome assembly. The workflow is located in Shared Data --> Workflows --> Mobilome and Resistome Analysis Workflow (MOB-Recon and STARAMR). The workflow file can also be manually downloaded from https://raw.githubusercontent.com/phac-nml/galaxy_tools/master/tools/mob_suite/workflows/AMRworkflow_STARAMR.ga. ----- diff --git a/mob_suite/wrappers/mob_typer.xml b/mob_suite/wrappers/mob_typer.xml index 909c08b..c52c0ad 100644 --- a/mob_suite/wrappers/mob_typer.xml +++ b/mob_suite/wrappers/mob_typer.xml @@ -1,49 +1,112 @@ - + Get the plasmid type and mobility given its sequence - mob_suite - + mob_suite + + mob_typer --version
+ - - + - - -
+ + + + + + + + + + + + + + +
- - - + - - - - + + + + + + + + + + + + diff --git a/setup.py b/setup.py index a655b7b..1357d36 100644 --- a/setup.py +++ b/setup.py @@ -29,8 +29,8 @@ def read(fname): setup( name='mob_suite', include_package_data=True, - version='3.0.0', - python_requires='>=3.7.0', + version='3.0.1', + python_requires='>=3.7.0,<4', setup_requires=['pytest-runner'], tests_require=['pytest'], packages=find_packages(exclude=['tests', 'databases']), @@ -46,15 +46,15 @@ def read(fname): package_data={'mob_suite': ['config.json']}, install_requires=[ - 'numpy>=1.11.1', - 'tables>=3.3.0', - 'pandas>=0.22.0', - 'biopython>=1.70', - 'pycurl>=7.43.0', - 'scipy>=1.1.0', - 'ete3>=3.0', - 'six>=1.10', - 'pyqt5>=5.0' + 'numpy>=1.11.1,<2', + 'tables>=3.3.0,<4', + 'pandas>=0.22.0,<=1.0.5', + 'biopython>=1.70,<2', + 'pycurl>=7.43.0,<8', + 'scipy>=1.1.0,<2', + 'ete3>=3.0,<4', + 'six>=1.10,<2', + 'pyqt5>=5.0,<6' ], entry_points={