Skip to content

Commit

Permalink
v3.1
Browse files Browse the repository at this point in the history
Change assembly filtering step: try filtering contigs based on SPAdes K-mer coverage, contigs length and GC content, and if it fails filter only based on contigs length and GC content (to try to rescue the sample from failing)
Change PASS, FAIL, WARNING rules (FAIL overrides WARNING):
 - FAIL
   - Low estimated coverage calculated in EstimatedCoverage module (number of sequenced nucleotides / expected genome size) (default 15x). STOPs sample running.
   - Lower sample coverage, higher number of absent genes or higher number of genes with multiple alleles than specified in TrueCoverage module config file. STOPs sample running.
   - Fail FastQC “Per base sequence quality”, “Overrepresented sequences”, “Per sequence GC content” or “Sequence length distribution”. Do not pass FastQC “Per base N content” or “Adapter Content”. STOPs sample running if sample FastqQC fails after Trimmomatic reads cleaning.
   - Zero read pairs survive to Trimmomatic cleaning.
   - AssemblyMapping module does not run successfully.
   - Assembly coverage (calculated in AssemblyMapping module) of filtered contigs does not reach the minimum required (30x).
   - MLST scheme found does not match with provided species (mlst module) (with the exception of Yersinia genus, which might raise a warning).
 - WARNING
   - Fail FastQC “Per base sequence content”. Do not pass FastQC “Per base sequence quality” or “Overrepresented sequences”.
   - Higher number of contigs than allowed or odd number of assembled nucleotides.
   - Less than 95% of the reads mapped back to the assembly (in AssemblyMapping module).
   - mlst module did not run.
   - Found MLST scheme for a species with unknown scheme. In case of Yersinia genus, only raises a warning if the specific scheme found does not match with the scheme for provided species name (but matches the genus).

Make SPAdes QC assessment only depending of SPAdes run information (not relying on AssemblyMapping anymore).
Remove Trimmomatic and Pear information from QC assessment.
Update E. coli maximum number of multiple alleles in TrueCoverage module.
Include ReMatCh (https://github.com/B-UMMI/ReMatCh) as dependency for TrueCoverage module running.
Make INNUca compatible with new MLST version.
Add option –fastQCproceed to force INNUca to continue even if a sample fails FastQC.
Add option --maxNumberContigs to set the maximum number of contigs per 1.5 Mb of expected genome size (useful for species that intrinsically produce a more fragmented genome assembly).
Change --spadesUse_3_9 to –spadesVersion to specify a SPAdes version (default: 3.11.0).
Add option --noLog to tell INNUca to not create a log file (useful in Slurm environment since stdout and stderr are usually saved in a file).
Add option --noGitInfo to tell INNUca to not retreive GitHub repository information (useful when running INNUca in parallel independent jobs for many samples since it might save some time).
Change final general report (PASS, FAIL, WARNING samples).
Write failing and warning reports for each sample.
Change dependencies checking.
Fix minor errors.
Update README.
Add Dockerfile with Docker image recipe creation.
  • Loading branch information
miguelpmachado authored Sep 12, 2017
1 parent 7d0a8de commit fd97326
Show file tree
Hide file tree
Showing 157 changed files with 32,061 additions and 3,649 deletions.
38 changes: 38 additions & 0 deletions Docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
FROM ubuntu:16.04

WORKDIR /NGStools/

RUN apt-get update

# -- General Dependencies ---
RUN apt-get install -y git wget

# -- INNUca General Dependencies ---
RUN apt-get install -y python-dev default-jre

# -- mlst Dependencies --
# Blast
RUN wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz && tar -xf ncbi-blast-2.6.0+-x64-linux.tar.gz && rm ncbi-blast-2.6.0+-x64-linux.tar.gz
ENV PATH="/NGStools/ncbi-blast-2.6.0+/bin:${PATH}"
# Perl libs
RUN apt-get install -y libmoo-perl liblist-moreutils-perl

# --- mlst ----
RUN git clone https://github.com/tseemann/mlst.git
ENV PATH="/NGStools/mlst/bin:${PATH}"

# --- ReMatCh ---
RUN git clone https://github.com/B-UMMI/ReMatCh.git
ENV PATH="/NGStools/ReMatCh:${PATH}"

# --- INNUca ---
RUN git clone https://github.com/B-UMMI/INNUca.git
ENV PATH="/NGStools/INNUca:${PATH}"

# fixing permissions for MLST update
RUN chmod +x /NGStools/INNUca/Docker/update_mlst_db.sh && chmod o+wr /NGStools/mlst/scripts/ && chmod -R o+wr /NGStools/mlst/db/ && sed -i "s#OUTDIR=pubmlst#OUTDIR=/NGStools/mlst/scripts/pubmlst#1" /NGStools/mlst/scripts/mlst-download_pub_mlst

# Clean
RUN apt-get remove -y wget && apt-get autoclean -y

WORKDIR /data/
63 changes: 63 additions & 0 deletions Docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
INNUca.py - Docker
===============
INNUca - Reads Control and Assembly

*INNUENDO quality control of reads, de novo assembly and contigs quality assessment, and possible contamination search*

<https://github.com/B-UMMI/INNUca>


This is a dockerfile for using INNUca, with all dependencies already installed.

Within this container you can find:
- ubuntu:16.04
- git
- Python v2.7
- Java-JRE
- [Blast+](https://blast.ncbi.nlm.nih.gov/Blast.cgi) v2.6.0
- [mlst](https://github.com/tseemann/mlst)
- [ReMatCh](https://github.com/B-UMMI/ReMatCh) v3.2
- [INNUca](https://github.com/B-UMMI/INNUca) v3.1



### Using play-with-docker
[![Try in PWD](https://cdn.rawgit.com/play-with-docker/stacks/cff22438/assets/images/button.png)](http://labs.play-with-docker.com/)

Within [play-with-docker](http://labs.play-with-docker.com/) webpage click on **create session**. Then, another page
will open with a big counter on the upper left corner. Click on **+ add new instance** and a terminal like instance should be generated on the right. On
this terminal you can load this docker image as follows:

`docker pull ummidock/innuca:3.1`

#### Build this docker on your local machine

For this, docker needs to be installed on your machine. Instructions for this can be found [here](https://docs.docker.com/engine/installation/).

##### Using DockerHub (automated build image)

`docker pull ummidock/innuca:3.1`

##### Using GitHub (build docker image)

1) `git clone https://github.com/B-UMMI/INNUca.git`
2) `docker build -t innuca:3.1 ./INNUca/Docker/`

### Run (using automated build image)
docker run --rm -u $(id -u):$(id -g) -it -v /local/folder/fastq_data:/data/ ummidock/innuca:3.1 INNUca.py --speciesExpected "Streptococcus agalactiae" --genomeSizeExpectedMb 2.1 --inputDirectory /data/ --outdir /data/innuca_output/ --threads 8 --maxNumberContigs 100



### Updating the mlst database in docker instance

After you've built the docker image, you can still update the mlst database. For this, the `update_mlst_db.sh` script is provided. Simply run in after initiating the instance with:

`/NGStools/INNUca/Docker/update_mlst_db.sh`

For more information on this please consult the [provided information](https://github.com/tseemann/mlst#updating-the-database) in the [mlst page](https://github.com/tseemann/mlst).

Contact
-------
Miguel Machado <mpmachado@medicina.ulisboa.pt>
Catarina Mendes
<cimendes@medicina.ulisboa.pt>
9 changes: 9 additions & 0 deletions Docker/update_mlst_db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash

#script to update the mlst database

/NGStools/mlst/scripts/mlst-download_pub_mlst | bash
rm -r /NGStools/mlst/db/pubmlst/
mv /NGStools/mlst/scripts/pubmlst /NGStools/mlst/db/
/NGStools/mlst/scripts/mlst-make_blast_db

Loading

0 comments on commit fd97326

Please sign in to comment.