Developed by the Bork Group Raise an issue or contact us See our other Software & Services |
Contributors: |
|
The development of this workflow was supported by NFDI4Microbiota |
The nHUMAnN workflow
is a nextflow workflow for running HUMAnN3
based on Metaphlan4
profiles via joint index generation. The workflow includes optional read preprocessing and host/human decontamination steps provided by the nevermore workflow library.
Due to compatibility issues between current CHOCOPhlAn
databases and recent versions of HUMAnN3
, nHUMAnN
makes use of a patched HUMAnN3
version obtainable as a Docker container.
Also cite:
Beghini F, McIver LJ, Blanco-Míguez A, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. Elife. 2021;10:e65088. Published 2021 May 4. doi:10.7554/eLife.65088
The easiest way to handle dependencies is via Singularity/Docker containers. Alternatively, conda environments, software module systems or native installations can be used.
Preprocessing and QA is done with bbmap
, fastqc
, and multiqc
.
Decontamination is done with kraken2
and additionally requires seqtk
.
Host removal requires a kraken2
host database.
The default supported MetaPhlAn
version is 4.
Get the mpa_vOct22_CHOCOPhlAnSGB_202212
database from here, unpack the tarball, and point the --mp4_db
parameter to the database's root directory.
In params.yml
:
mp4_db: "/path/to/mpa_vOct22_CHOCOPhlAnSGB_202212/"
On the command line:
--mp4_db "/path/to/mpa_vOct22_CHOCOPhlAnSGB_202212/"
The default supported HUMAnN3
version is 3.
Get the annotated CHOCOPhlAn
db from here and the annotated uniref db from here, unpack the tarballs and set the respective parameters.
In params.yml
:
humann_nuc_db: "/path/to/full_chocophlan_db/"
humann_prot_db: "/path/to/uniref90_annotated_v201901b_full/"
On the command line:
--humann_nuc_db "/path/to/full_chocophlan_db/"
--humann_prot_db "/path/to/uniref90_annotated_v201901b_full/"
This workflow will be available on the CloWM
platform (coming soon).
The workflow run is controlled by environment-specific parameters (see run.config) and study-specific parameters (see params.yml). The parameters in the params.yml
can be specified on the command line as well.
You can either clone this repository from GitHub and run it as follows
git clone https://github.com/grp-bork/nHUMAnN.git
nextflow run /path/to/nhumann [-resume] -c /path/to/run.config -params-file /path/to/params.yml
Or, you can have nextflow pull it from github and run it from the $HOME/.nextflow
directory.
nextflow run cschu/nHUMAnN [-resume] -c /path/to/run.config -params-file /path/to/params.yml
Fastq files are supported and can be either uncompressed (but shouldn't be!) or compressed with gzip
or bzip2
. Sample data must be arranged in one directory per sample.
All files in a sample directory will be associated with the name of the sample folder. Paired-end mate files need to have matching prefixes. Mates 1 and 2 can be specified with suffixes _[12]
, _R[12]
, .[12]
, .R[12]
. Lane IDs or other read id modifiers have to precede the mate identifier. Files with names not containing either of those patterns will be assigned to be single-ended. Samples consisting of both single and paired end files are assumed to be paired end with all single end files being orphans (quality control survivors).