Skip to content

Functional profiling

Gavin Douglas edited this page Oct 4, 2017 · 8 revisions

HUMAnN2 Commands

Running HUMAnN2 is much more straight-forward than HUMAnN1 and the authors have provided a great standard workflow, which describes usage examples. One appealing aspect of this tool is that you can specify which steps of the workflow should be skipped so you can input data at whatever step you want. Also, it first runs MetaPhlAn2 to determine which taxa represent the majority of community variation and you can use this output file rather than running MetaPhlan2 again yourself.

HUMAnN2 functional predictions are based on UniRef gene families, usually UniRef90, and MetaCyc pathways (which are more interpretable higher-functions).

A basic HUMAnN2 command is showing below, but you can see more options by typing humann2 -h.

humann2 --threads 1 --input cat_reads/SAMPLE.fastq  --output humann2_out/

Typically you will want to run HUMAnN2 on a number of samples, which you can easily do using the parallel command (see this tutorial if you are new to parallel). The below command will run HUMAnN2 on 4 FASTQs at a time in cat_reads/, each command using 1 thread. There will be a subdirectory created for each sample with humann2_out/.

parallel -j 4 'humann2 --threads 1 --input {} --output humann2_out/{/.}' ::: cat_reads/*fastq

You can read more about the output files in the HUMAnN2 authors' standard workflow, but briefly, there are three main output files:

  • "gene_families" - the abundance of UniRef gene families in each sample (either UniRef50 or Uniref90). These genes are collapsed internally to MetaCyc reactions ("RXNs") that make up the MetaCyc pathways. Note that many gene families have unknown functions.
  • "pathabundance" - the abundance of MetaCyc pathways in each sample.
  • "pathcoverage" - the coverage of each pathway in the "pathabundance" by genes in the "gene_families" file. This file can be helpful to explore to see if there are certain pathways whose abundance is based on only a small fraction of the genes that make up the pathway.

These output files can be combined into a single study with humann2_join_tables. This program will search a specified folder (--input) for all files that contain the specified string (--file_name) and combine them into one output file. The -s option indicates that subdirectories in the input folder should also be checked.

humann2_join_tables -s --input humann2_out/ --file_name pathabundance --output humann2_pathabundance.tsv
humann2_join_tables -s --input humann2_out/ --file_name pathcoverage --output humann2_pathcoverage.tsv
humann2_join_tables -s --input humann2_out/ --file_name genefamilies --output humann2_genefamilies.tsv

We can then re-normalize the gene family and pathway abundance per sample so that they are comparable (so that each sample's abundance sums to 100). Note that since the pathway coverage files do not correspond to abundances you should not run this command on them.

humann2_renorm_table --input humann2_pathabundance.tsv --units relab --output humann2_pathabundance_relab.tsv
humann2_renorm_table --input humann2_genefamilies.tsv --units relab --output humann2_genefamilies_relab.tsv

HUMAnN2 stratifies the raw output by the corresponding taxa's genome that the profiled function was found in. This is extremely helpful, but to get the data into a table without this stratification you can use the below commands.

humann2_split_stratified_table --input humann2_pathabundance_relab.tsv --output ./
humann2_split_stratified_table --input humann2_genefamilies_relab.tsv --output ./

These commands will generate "stratified" and "unstratified" versions of each output table.

HUMAnN1 Commands

HUMAnN1 functional profiles were based on the KEGG database. You can read more about this tool in the publication.

Run pre-HUMAnN (DIAMOND search).

run_pre_humann.pl -p 4 -o pre_humann/ screened_reads/*

Run HUMAnN (link files to HUMAnN "input" directory and then run HUMAnN with scons command). Note that you can run this in parallel with -j option (e.g. scons -j 4), but I have found this often causes HUMAnN to unexpectedly error.

ln -s $PWD/pre_humann/* ~/programs/humann-0.99/input/
cd ~/programs/humann-0.99/
scons

Convert HUMAnN output to STAMP format

humann_to_stamp.pl 04b-hit-keg-mpm-cop-nul-nve-nve.txt > hummann_modules.spf
humann_to_stamp.pl 04b-hit-keg-mpt-cop-nul-nve-nve.txt > hummann_pathways.spf
humann_to_stamp.pl 01b-hit-keg-cat.txt > hummann_kos.spf
Clone this wiki locally