Skip to content

Nextflow pipeline and Docker integration into Shaman

Notifications You must be signed in to change notification settings

Etjean/Long_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Long Project

Nextflow pipeline and Docker integration into SHAMAN.

Background

Quantitative metagenomics is broadly employed to identify genera or species associated with several diseases. These data are obtained by mapping the reads of each sample against operational taxonomic units (OTU) or a gene catalog. SHAMAN was one the first web application that allowed to clinician and biologist to perform an interactive analysis of quantitative metagenomics data with a dynamic-interface dedicated to the diagnostic and to the differential analysis. The interface integrates the experimental design (association of sample to one or several conditions), the statistical process for differential analysis and a real-time visualisation system.

SHAMAN is based on R, Shiny and DESeq2. The analytical process is divided into four steps : count matrix/annotation submission, normalisation, modelisation and visualisation. The count matrix is normalised at the OTU/gene level using the DESeq2 normalisation method and then, based on the experimental design, a generalised linear model is applied to detect differences in abundance at the considered taxonomic level.

Two years after first release, we can see a great interest from the metagenomics community with 5 publications using SHAMAN (in Nature microbiology, PNAS and Science Advances), 3 publications not involving SHAMAN authors, 74 active users per month (1430 unique visitors since first publication - 70 % are regular users) and 514 downloads of the Docker application. Several trainings were also performed to train biologist to use SHAMAN in the Pasteur Network and at ENS.

Project

We want to integrate a full automatized bioinformatic workflow based on Nextflow for targeted metagenomics data. This implementation will follow the current approach already implemented in bash workflow (MASQUE pipeline). The workflow should also be included in the SHAMAN Docker application for local installation on windows/mac/linux.

Usage

  • Download databases
    Download databases here and unzip them.

  • Run the pipeline
    To perform a test run of the script directly :

    docker run --rm \
      -v /path/to/masque.nf:/masque.nf \
      -v /path/to/databases:/db \
      -v /path/to/data:/data \
      etjean/shaman_nextflow \
      nextflow masque.nf --i /data --o /data/result
    

    Or open an interactive container first, and then run the script :

    docker run -ti \
      -v /path/to/masque.nf:/masque.nf \
      -v /path/to/databases:/db \
      -v /path/to/data:/data \
      etjean/shaman_nextflow
      
    nextflow masque.nf --i /data --o /data/result
    
  • Arguments

    Usage:
    16S/18S:   nextflow masque.nf      --i </path/to/input/> --o </path/to/result/>
    23S/28S:   nextflow masque.nf  --l --i </path/to/input/> --o </path/to/result/>
    ITS:       nextflow masque.nf  --f --i </path/to/input/> --o </path/to/result/>
    Amplicon:  nextflow masque.nf      --a <amplicon file>   --o </path/to/result/>
    
    All parameters:
    --i                       Provide </path/to/input/directory/>
    --a                       Provide <amplicon file>
    --o                       Provide </path/to/result/directory/>
    --n                       Indicate <project-name>
                              (default: use the name of the input directory or meta)
    --t                       Number of <thread>
                              (default: Nextflow automatic parallelization)
    --c                       Contaminant filtering [danio,human,mouse,mosquito,phi]
                              (default: human,phi)
    --s                       Perform OTU clustering with swarm
                              (default: vsearch)
    --b                       Perform taxonomical annotation with blast
                              (default: vsearch)
    --l                       Perform taxonomical annotation
                              against LSU databases: Silva/RDP
    --f                       Perform taxonomical annotation
                              against ITS databases: Unite/Findley/Underhill/RDP
    --minreadlength           Minimum read length take in accound in the study
                              (default: 35nt)
    --minphred                Qvalue must lie between [0-40]
                              (default: minimum qvalue 20)
    --minphredperc            Minimum allowed percentage of correctly called
                              nucleotides [0-100] (default: 80)
    --nbMismatchMapping       Maximum number of mismatch when mapping end-to-end
                              against Human genome and Phi174 genome
                              (default: 1 mismatch is accepted)
    --paired                  Paired-ends reads mode
    --minoverlap              Minimum overlap when paired reads are considered
                              (default: 10)
    --maxoverlap              Maximum overlap when paired reads are considered
                              (default: 200)
    --minampliconlength       Minimum amplicon length (default: 64)
    --minotusize              Indicate minimum OTU size (default: 4)
    --prefixdrep              Perform prefix dereplication
                              (default: full length dereplication)
    --chimeraslayerfiltering  Use ChimeraSlayer database for chimera filtering
                              (default: Perform a de novo chimera filtering)
    --otudiffswarm            Number of difference accepted in an OTU with swarm
                              (default: 1)
    --evalueTaxAnnot          Evalue threshold for taxonomical annotation with blast
                              (default: evalue=1E-5)
    --maxTargetSeqs           Number of hit per OTU with blast (default: 1)
    --identityThreshold       Identity threshold for taxonomical annotation with
                              vsearch (default: 0.75)
    --conservedPosition       Percentage of conserved position in the multiple
                              alignment considered for phylogenetic tree
                              (default: 0.8)
    --accurateTree            Accurate tree calculation with IQ-TREE instead of
                              FastTree (default: FastTree)
    --help                    Print this help
    

Problems

  • --t argument is obsolete.
  • --c argument is currently non fonctionnal.
  • More arguments control is needed.
  • --o does not create the output directory if it does not exists.

Resources

About

Nextflow pipeline and Docker integration into Shaman

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published