OUTPOST: a comprehensive analysis software for whole-metagenome shotgun sequencing incorporating group stratification.
# 1. Download `OUTPOST.yml` in `install` folder.
git clone https://github.com/Y-H-Joe/OUTPOST.git
cd OUTPOST
# 2. Create the conda environment `conda env create --name OUTPOST --file OUTPOST.yml `, the bioconductor has unknown errors.
export PIP_DEFAULT_TIMEOUT=100
mkdir -p ~/.pip
echo -e "[global]\ntimeout = 100\n" > ~/.pip/pip.conf
conda env create --name OUTPOST --file install/OUTPOST.yml
# 3. Activate the environment `conda activate OUTPOST`.
conda activate OUTPOST
# 3.1 Check whether your CLI has 'git' command. If not, please run `conda install anaconda::git`.
# 4. install. Databases will also be donwnloaded, require ~500GB. The downloading time is based on your bandwith.
# make sure your working directory is OUTPOST folder
bash install/OUTPOST_install.sh
# 1. prepare the `OUTPOST/OUTPOST_config.tsv` (for experiment, data, group information)
# 2. modify the `OUTPOST/Snakemake_config.yml` (for OUTPOST parameters)
# 3. run `python path/to/OUTPOST/check_snakefile_config.py path/to/Snakefile_config.yml` . If no errors, then you are good to go.
# 4. run OUTPOST
`nohup snakemake --cores 32 --verbose -s ./OUTPOST_run.py --rerun-incomplete &`
We prepared the example dataset and corresponding results in figshare, which can be used for testing your installation.
We also uploaded our independant analysis of a cat icrobiome dataset (described in our article) in figshare.
To Be Remind:
-
The working directory (where you run OUTPOST) must be the parent directory of
OUTPOST
, becasue some scripts in Snakemake.py use relative path. To be more specific, you have a foldertest
, you have all scripts of OUTPOST intest
folder, you havetest\Snakefile.py
andtest\OUTPOST
. Youcd test
, then runnohup snakemake --cores 32 --verbose -s ./Snakefile.py --rerun-incomplete &
, you will get all outputs intest\name_of_your_assembly
folder. -
Snakemake (the framework OUTPOST relied on) will lock working directory during running. So you should prepare two working directories if you're running two OUTPOST pipelines (or any other Snakemake based softwares). To be more specific,
mkdir folder1/
andmkdir folder2/
, copy the entire OUTPOST folder tofolder1/
andfolder2/
, respectively. Thencd folder1
, run OUTPOST. Thencd folder2
, run OUTPOST. -
The Snakemake based OUTPOST can also be deployed on cluster .
-
Do not change the column names of OUTPOST_config.tsv or the parameter names of Snakefile_config.yml.
-
Refer to OUTPOST/OUTPOST_config.explanation.txt for more details.
We prepared the example dataset and corresponding results in figshare, which can be used for testing your installation.
We also uploaded our independant analysis of a cat icrobiome dataset (described in our article) in figshare using OUTPOST version 1.0 .
We strongly suggest to read the OUTPOST_report.html in OUTPOST_report_example.zip
.
Each OUTPOST run accepts one assembly, all outputs are categorized in the folder named by the assembly.
antibiotic_genes_analysis
assembly_analysis
batch_effect
benchmark
biomarkers_analysis
data
diversity_analysis
function_analysis
LDA_analysis
log
metaphlan_analysis
plasmids_analysis
qc
report
taxonomy_analysis
virulence_factors_analysis
[ 4] report/
├── [ 14] materials
└── [ 38K] OUTPOST_report.html
The OUTPOST report is a HTML file with materials, which contains the general results description and annoatation.
For more annotation, please refer to OUTPOST report html.
log/
├── [ 0] alpha_beta_diversity.done
...
├── [ 0] virulence_factors_analysis.done
├── [ 0] virulence_factors_analysis.log
├── [ 0] visualize_batch_effect.done
└── [ 0] visualize_batch_effect.log
The log
folder contains the log for all processing modules of OUTPOST. Notably, OUTPOST use these log files with suffix of .done
to judge the status of each rule. Users can check the Snakemake.py
, if the done
files for certain rule exists, OUTPOST will regart it as successfully finished. If users want to re-run certain rules, they should remove the corressponding done
files in the first place.
benchmark/
├── [ 141] alpha_beta_diversity.benchmark
...
├── [ 143] virulence_analysis.benchmark
├── [ 145] virulence_factors_analysis.benchmark
└── [ 140] visualize_batch_effect.benchmark
$cat benchmark/alpha_beta_diversity.benchmark
s h:m:s max_rss max_vms max_uss max_pss io_in io_out mean_load cpu_time
14.7261 0:00:14 487.06 19615.18 446.61 461.81 5.42 3.06 186.02 29.58
The benchmark
folder contains the log information of each computation module in OUTPOST. Users can check the benchmark files for diagnosis. For example,
For more annotation, please refer to OUTPOST report html.
[ 56] qc/
├── [234K] buffalo_1.fastp.html
...
├── [ 51K] pig_3.fastp.json
├── [ 0] TrimSummmry_buffalo_1.txt
...
├── [ 0] TrimSummmry_pig_1.txt
├── [ 0] TrimSummmry_pig_2.txt
└── [ 0] TrimSummmry_pig_3.txt
The qc
folder contains quality control results for reads. For example,
[ 10] batch_effect/
├── [ 32K] all_samples.taxa_counts.rel_abun.class.rmU.batch_effect_PCA.pdf
...
├── [ 25K] all_samples.taxa_counts.rel_abun.species.rmU.batch_effect_PCA.pdf
├── [ 32K] all_samples.taxa_counts.rel_abun.superkingdom.rmU.batch_effect_PCA.pdf
└── [ 25K] all_samples.taxa_counts.rel_abun.taxaID.rmU.batch_effect_PCA.pdf
If rm_batch_effect
is True
, OUTPOST will visualize the principal components (PCA) as well as the variance distribution. To check the batch effect, users can compare the plots before and after batch effect removal. For example,
(If image failed to show, click it to view.)
For more annotation, please refer to OUTPOST report html.
taxonomy_analysis/
├── [628K] boxplot_normal_vs_obese
│ ├── [4.0K] class
│ ├── [ 28K] family
...
├── [4.0K] counts_tables
│ ├── [ 31K] cat.taxa_counts.rel_abun.class.csv
│ ├── [ 31K] cat.taxa_counts.rel_abun.class.rmU.csv
│ ├── [7.0K] cat.taxa_counts.rel_abun.class.rmU.top20.addOthers.csv
...
│ ├── [6.2K] cat.taxa_counts.rel_abun.taxaID.rmU.top20.csv
│ ├── [6.4K] cat.taxa_counts.rel_abun.taxaID.rmU.top20.fillmin.scaled.csv
│ └── [ 193] cat.taxa_counts.rel_abun.taxaID.rmU.top20.fillmin.scaled.index
├── [4.0K] figs
│ ├── [7.9K] cat.rel_abun.normal_vs_obese.at_class.rel_abun.equal.top20.fillmin.scaled.heatmap.pdf
│ ├── [7.9K] cat.rel_abun.normal_vs_obese.at_class.rel_abun.unequal.top20.fillmin.scaled.heatmap.pdf
...
│ ├── [8.1K] cat.taxa_counts.rel_abun.taxaID.rmU.top20.barplot.pdf
│ ├── [7.7K] cat.taxa_counts.rel_abun.taxaID.rmU.top20.fillmin.scaled.heatmap.pdf
│ └── [8.1K] Rplots.pdf
├── [4.0K] kaiju
│ ├── [ 66M] cat_kaiju.ref
│ ├── [ 96M] cat_kaiju.ref.nm
│ └── [ 36M] cat_kaiju.ref.nm.tsv
├── [4.0K] temp
│ ├── [ 19M] cat.counts.pure
│ ├── [9.1M] D001_obese_contigs_sorted.cat.bam.idxstats
│ ├── [808K] D001_obese_contigs_sorted.cat.bam.idxstats.pure
...
│ ├── [9.1M] Z116_normal_contigs_sorted.cat.bam.idxstats
│ └── [783K] Z116_normal_contigs_sorted.cat.bam.idxstats.pure
├── [ 12K] top_taxa_normal_vs_obese
│ ├── [6.8K] cat.rel_abun.normal_vs_obese.at_class.rel_abun.equal.top20.csv
│ ├── [6.5K] cat.rel_abun.normal_vs_obese.at_class.rel_abun.equal.top20.fillmin.scaled.csv
...
│ └── [ 297] cat.rel_abun.normal_vs_obese.at_taxaID.rel_abun.unequal.top20.fillmin.scaled.index
└── [4.0K] utest_normal_vs_obese
├── [2.3K] cat.rel_abun.normal_vs_obese.at_class.ave_change.equal.csv
...
├── [833K] cat.rel_abun.normal_vs_obese.at_taxaID.rel_abun.unequal.csv
└── [630K] cat.rel_abun.normal_vs_obese.at_taxaID.u-test.two_sided.csv
The boxplot
folder contains the boxplots for all significant taxa crossing all taxonomy levels for all group-pair comparisons. Also, all plots generated by OUTPOST are vector PDF format for the convenience of publication.
(If image failed to show, click it to view.)
The figs
folder contains the heatmap and barplots: OUTPOST not only draws heatmap for unequal taxa, but also for equal taxa.
(If image failed to show, click it to view.) OUTPOST produces barplot for all taxonomy levels. Here we have top 20 taxa, the 20 here is an adjustable parameter.
(If image failed to show, click it to view.)
The csv
files here are intermediate tables. OUTPOST include these tables for user's convenience. The kaiju
folder contains the taxonomy annotated contigs table: The top_taxa
folders contain the intermediate tables for top abundant taxa crossing all taxonomy levels for every group pair. The utes
t folders contain the statistical results.
For more annotation, please refer to OUTPOST report html.
diversity_analysis/
└── [4.0K] alpha_beta_normal_vs_obese
├── [1.8K] alpha_diversity.at_genus.csv
├── [1.8K] alpha_diversity.at_species.csv
├── [ 32] beta_pcoa_P-value.bray.at_genus.csv
├── [ 31] beta_pcoa_P-value.bray.at_species.csv
...
├── [4.9K] bray_ln.at_species.csv
├── [497K] cat.taxa_counts.rel_abun.genus.rmU.normal_vs_obese.csv
├── [2.0M] cat.taxa_counts.rel_abun.species.rmU.normal_vs_obese.csv
├── [ 453] group.csv
├── [ 297] index.txt
├── [5.7K] Inv_Simpson.alpha_diveristy.at_genus.pdf
├── [5.7K] Inv_Simpson.alpha_diveristy.at_species.pdf
├── [4.9K] jaccard.at_genus.csv
...
├── [6.9K] PCoA12.jaccard.at_genus.pdf
├── [6.9K] PCoA12.jaccard.at_species.pdf
...
├── [5.7K] Shannon.alpha_diveristy.at_genus.pdf
├── [5.7K] Shannon.alpha_diveristy.at_species.pdf
├── [5.7K] Simpson.alpha_diveristy.at_genus.pdf
├── [5.7K] Simpson.alpha_diveristy.at_species.pdf
├── [5.8K] Simpson_evenness.alpha_diveristy.at_genus.pdf
└── [5.8K] Simpson_evenness.alpha_diveristy.at_species.pdf
This folder contains all the alpha and beta diversity analysis crossing every group-pair.
(base) yh@superServer:human62_batch_effect2$ l diversity_analysis/
alpha_beta_asian_vs_euro/ alpha_beta_asian_vs_ill/ alpha_beta_female_vs_euro/ alpha_beta_healthy_vs_female/ alpha_beta_male_vs_euro/
alpha_beta_asian_vs_female/ alpha_beta_asian_vs_male/ alpha_beta_female_vs_ill/ alpha_beta_healthy_vs_ill/ alpha_beta_male_vs_female/
alpha_beta_asian_vs_healthy/ alpha_beta_euro_vs_ill/ alpha_beta_healthy_vs_euro/ alpha_beta_healthy_vs_male/ alpha_beta_male_vs_ill/
Each sub-folder contains alpha and beta diversity analysis intermediate tables and graphs.
For example,
(If image failed to show, click it to view.)
For more annotation, please refer to OUTPOST report html.
[ 7] metaphlan_analysis/
├── [ 20] bowtie2
├── [ 11] diversity
├── [ 5] figs
├── [ 20] krona
└── [ 24] taxonomy
This folder contains all the analysis results using MetaPhlan4, including some results specially generated by OUTPOST. This folder include diversity analysis, taxonomy analysis, and more.
OUTPOST generated GraPhlAn plot, heatmap and Krona Plot for each Sample. The tables for taxonomy, Krona and Bowtie2 intermediates are also stored.
For example,
(If image failed to show, click it to view.)
(If image failed to show, click it to view.)
For more annotation, please refer to OUTPOST report html.
The folder contains the metabolism analysis results, mainly based on the 5 independent databases (which are selectable in Snakemake_config.yml
file)
function_analysis/
├── [4.0K] figs
│ ├── [7.9K] allSamples_genefamilies_uniref90names_relab_eggnog_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.equal.top20.fillmin.scaled.heatmap.pdf
...
│ ├── [8.0K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.equal.top20.fillmin.scaled.heatmap.pdf
│ ├── [8.3K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.top20.fillmin.scaled.heatmap.pdf
│ ├── [8.1K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.top20.fillmin.scaled.heatmap.pdf
│ └── [8.2K] Rplots.pdf
└── [4.0K] humann3
├── [ 32K] allSamples
│ ├── [3.1M] 111030_normal_genefamilies_cpm_eggnog.tsv
│ ├── [451K] 111030_normal_genefamilies_cpm_ko.tsv
...
│ ├── [163K] Z116_normal_pathcoverage_cpm.tsv
│ ├── [166K] Z116_normal_pathcoverage_relab.tsv
│ └── [175K] Z116_normal_pathcoverage.tsv
├── [ 16K] normal
│ ├── [3.1M] 111030_normal_genefamilies_cpm_eggnog.tsv
│ ├── [451K] 111030_normal_genefamilies_cpm_ko.tsv
...
│ ├── [166K] Z116_normal_pathcoverage_relab.tsv
│ └── [175K] Z116_normal_pathcoverage.tsv
├── [ 16K] obese
│ ├── [2.3M] D001_obese_genefamilies_cpm_eggnog.tsv
│ ├── [379K] D001_obese_genefamilies_cpm_ko.tsv
│ ├── [808K] D001_obese_genefamilies_cpm_level4ec.tsv
...
│ ├── [207K] L001_obese_pathcoverage_cpm.tsv
│ ├── [212K] L001_obese_pathcoverage_relab.tsv
│ └── [221K] L001_obese_pathcoverage.tsv
├── [ 28K] ori_results
│ ├── [3.1M] 111030_normal_genefamilies_cpm_eggnog.tsv
│ ├── [451K] 111030_normal_genefamilies_cpm_ko.tsv
...
│ ├── [179K] Z116_normal_pathabundance.tsv
│ ├── [163K] Z116_normal_pathcoverage_cpm.tsv
│ ├── [166K] Z116_normal_pathcoverage_relab.tsv
│ └── [175K] Z116_normal_pathcoverage.tsv
├── [ 20K] output
│ ├── [ 16M] allSamples_genefamilies_uniref90names_cpm_eggnog_stratified.tsv
...
├── [ 12K] top_humann_normal_vs_obese
│ ├── [5.8K] allSamples_genefamilies_uniref90names_relab_eggnog_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.equal.top20.csv
...
│ ├── [7.1K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.top20.csv
│ ├── [7.6K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.top20.fillmin.scaled.csv
│ └── [ 297] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.top20.fillmin.scaled.index
└── [4.0K] utest_normal_vs_obese
├── [957K] allSamples_genefamilies_uniref90names_relab_eggnog_unstratified.named.rel_abun_format.normal_vs_obese.ave_change.equal.csv
...
├── [253K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.csv
└── [420K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.u-test.two_sided.csv
The humann3
folder contains all intermediate tables for our humann3 process. If users want to skip the humann_init (as former suggested), they should mkdir -p path_to_metabolism_analysis/metabolism_analysis/humann3/ori_results
and prepare all the humann3 generated tables with suffix as genefamilies.tsv/pathabundance.tsv/pathcoverage.tsv under the ori_results
folder, then set skip_humann_init to be True.
The figs folder contains all the heatmap plots for every independent database crossing every group-pair.
For example,
(If image failed to show, click it to view.)
For more annotation, please refer to OUTPOST report html.
LDA_analysis/
├── [4.0K] figs_humann
│ ├── [ 70K] allSamples_genefamilies_uniref90names_relab_eggnog_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.lefse.pdf
│ ├── [ 27K] allSamples_genefamilies_uniref90names_relab_ko_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.equal.lefse.pdf
...
│ └── [ 39K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.lefse.pdf
├── [4.0K] figs_taxa
│ ├── [ 11K] cat.rel_abun.normal_vs_obese.at_class.rel_abun.unequal.lefse.pdf
│ ├── [ 14K] cat.rel_abun.normal_vs_obese.at_family.rel_abun.unequal.lefse.pdf
...
│ ├── [ 13K] cat.rel_abun.normal_vs_obese.at_species.rel_abun.equal.lefse.pdf
│ ├── [ 36K] cat.rel_abun.normal_vs_obese.at_species.rel_abun.unequal.lefse.pdf
│ └── [ 26K] cat.rel_abun.normal_vs_obese.at_taxaID.rel_abun.unequal.lefse.pdf
├── [ 12K] humann_normal_vs_obese
│ ├── [3.9M] allSamples_genefamilies_uniref90names_relab_eggnog_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.equal.lefse.format
│ ├── [959K] allSamples_genefamilies_uniref90names_relab_eggnog_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.equal.lefse.res
│ ├── [3.8M] allSamples_genefamilies_uniref90names_relab_eggnog_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.equal.lefse.tsv
...
│ ├── [181K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.lefse.format
│ ├── [ 93K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.lefse.res
│ └── [253K] allSamples_genefamilies_uniref90names_relab_rxn_unstratified.named.rel_abun_format.normal_vs_obese.rel_abun.unequal.lefse.tsv
└── [ 12K] taxa_normal_vs_obese
├── [ 13K] cat.rel_abun.normal_vs_obese.at_class.rel_abun.equal.lefse.format
├── [2.3K] cat.rel_abun.normal_vs_obese.at_class.rel_abun.equal.lefse.res
...
├── [312K] cat.rel_abun.normal_vs_obese.at_taxaID.rel_abun.lefse.res
├── [548K] cat.rel_abun.normal_vs_obese.at_taxaID.rel_abun.unequal.lefse.format
├── [156K] cat.rel_abun.normal_vs_obese.at_taxaID.rel_abun.unequal.lefse.res
└── [833K] cat.rel_abun.normal_vs_obese.at_taxaID.rel_abun.unequal.lefse.tsv
This folder contains the metabolism and taxonomy LDA results for every group-pair.
The figs_humann
and figs_taxa
folders contain the lefSE results.
For example,
(If image failed to show, click it to view.)
Other sub-folders contain the intermediate tables for user's convenience.
For more annotation, please refer to OUTPOST report html.
antibiotic_genes_analysis/
├── [250K] cat.antibiotic.tsv
├── [ 32K] cat.argannot.tsv
├── [ 93K] cat.card.tsv
├── [ 614] cat.ecoh.tsv
├── [ 63K] cat.megares.tsv
...
├── [105K] genes_taxa_counts_normal_vs_obese.tsv
└── [ 49K] genes_taxaID_normal_vs_obese_heatmap.pdf
These folders contain the dist plot and heatmap plot for all features with responding taxonomy, as well as intermediate tables.
For example,
(If image failed to show, click it to view.)
OUTPOST also provide all the annotation information in utils/abricate.annoatations.txt
, which can assist users to determine the antibiotic genes/plasmids/virulence factors. Also, the intermediate tables are helpful. The *.antibiotic.tsv
*.virulence.tsv
*.plasmidfinder.tsv
are summary tables.
$ head antibiotic_genes_analysis/cat.antibiotic.tsv
#FILE SEQUENCE START END GENE COVERAGE COVERAGE_MAP GAPS %COVERAGE %IDENTITY DATABASE ACCESSION
/home/yihang/refgenomes/cat_microbiome/GCA_022675345.1_ASM2267534v1_genomic.fna JAIZPE010000403.1 1526 2052 nimA_1 1-525/531 ========/====== 2/2 98.87 98.10 resfinder X71444
/home/yihang/refgenomes/cat_microbiome/GCA_022675345.1_ASM2267534v1_genomic.fna JAIZPE010006356.1 26667 26812 lsa(B)_1 1-146/1479 ==............. 0/0 9.87 76.03 resfinder AJ579365
/home/yihang/refgenomes/cat_microbiome/GCA_022675345.1_ASM2267534v1_genomic.fna JAIZPE010006955.1 1336 1476 VanR-A_1 556-696/696 ...........==== 0/0 20.26 78.72 resfinder FJ866609
/home/yihang/refgenomes/cat_microbiome/GCA_022675345.1_ASM2267534v1_genomic.fna JAIZPE010007078.1 1635 1688 lnu(C)_1 442-495/495 .............== 0/0 10.91 100.00 resfinder AY928180
...0
/home/yihang/refgenomes/cat_microbiome/GCA_022675345.1_ASM2267534v1_genomic.fna JAIZPE010022089.1 8462 8940 VanY-G_1 340-818/840 ......========= 0/0 57.02 75.99 resfinder DQ212986
For more annotation, please refer to OUTPOST report html.
[ 8] assembly_analysis/
├── [ 7] annotation
│ ├── [ 4] eggmapper
│ ├── [ 10] gtdbtk
│ ├── [ 7] metagenemark
│ ├── [ 9] prodigal
│ └── [ 6] rgi
├── [ 8] contigs
│ ├── [ 262] checkpoints.txt
│ ├── [ 0] done
│ ├── [ 15M] final.contigs.fa
│ ├── [ 80] intermediate_contigs
│ ├── [184K] log
│ └── [2.9K] options.json
├── [ 11] outpost_contigs
│ ├── [ 23K] outpost_assembly_split
│ ├── [ 15M] outpost_nonrd_contigs.fasta
│ ├── [ 28M] outpost_nonrd_contigs.fasta.0123
│ ├── [ 17] outpost_nonrd_contigs.fasta.amb
│ ├── [1.2M] outpost_nonrd_contigs.fasta.ann
│ ├── [ 45M] outpost_nonrd_contigs.fasta.bwt.2bit.64
│ ├── [940K] outpost_nonrd_contigs.fasta.clstr
│ ├── [3.5M] outpost_nonrd_contigs.fasta.pac
│ └── [ 2] stdin.split
├── [ 4] quantify
│ ├── [ 4] index
│ └── [ 4] quantify
├── [ 15] quast
│ ├── [ 6] basic_stats
│ ├── [ 52K] icarus.html
│ ├── [ 3] icarus_viewers
│ ├── [ 6] predicted_genes
│ ├── [4.4K] quast.log
│ ├── [418K] report.html
│ ├── [ 31K] report.pdf
│ ├── [1.5K] report.tex
│ ├── [ 778] report.tsv
│ ├── [1.7K] report.txt
│ ├── [1.3K] transposed_report.tex
│ ├── [ 778] transposed_report.tsv
│ └── [1.3K] transposed_report.txt
└── [ 4] taxa_counts
├── [1.0K] all_samples.taxa_counts.summarized.tsv
└── [4.4M] all_samples.taxa_counts.tsv
This folder contains assemble contigs, final assembly, assembly quality control, assembly genes prediction, assembly genes annotation, assembly genes quantification, and assembly genes taxonomy counts.
For more annotation, please refer to OUTPOST report html.
biomarkers_analysis/
├── [4.0K] ANCOM_identification
│ ├── [ 11K] ancom_biomarker.normal_vs_obese.at_class.tsv
│ ├── [ 49K] ancom_biomarker.normal_vs_obese.at_family.tsv
...
│ ├── [ 33K] ancom_biomarkers.volcano.normal_vs_obese.at_genus.pdf
│ ├── [ 12K] ancom_biomarkers.volcano.normal_vs_obese.at_order.pdf
│ ├── [9.0K] ancom_biomarkers.volcano.normal_vs_obese.at_phylum.pdf
│ ├── [323K] ancom_biomarkers.volcano.normal_vs_obese.at_species.pdf
│ ├── [4.8K] ancom_biomarkers.volcano.normal_vs_obese.at_superkingdom.pdf
│ └── [193K] ancom_biomarkers.volcano.normal_vs_obese.at_taxaID.pdf
├── [3.5K] OUTPOST_biomarker_scores.normal_vs_obese.class.tsv
├── [ 16K] OUTPOST_biomarker_scores.normal_vs_obese.family.tsv
...
├── [ 241] OUTPOST_biomarker_scores.normal_vs_obese.superkingdom.tsv
└── [432K] OUTPOST_biomarker_scores.normal_vs_obese.taxaID.tsv
The higher OUTPOST_biomarker_score, the more likely the taxonomy is the biomarker. Full score is 7. The following example shows that Escherichia coli
is most likely to be biomarker species.
$ head biomarkers_analysis/OUTPOST_biomarker_scores.normal_vs_obese.species.tsv
taxa_abun_top_100 taxa_abun_signicant LDA_larger_than_2 virulence_top_100 plasmid_top_100 antibiotic_top_100 ANCOM_identified OUTPOST_biomarker_score
Escherichia coli 1.0 1.0 1.0 1.0 1.0 1.0 1.0 7.0
Holdemanella biformis 1.0 1.0 1.0 1.0 0.0 1.0 1.0 6.0
...
uncultured bacterium 1.0 1.0 1.0 1.0 0.0 1.0 1.0 6.0
Bifidobacterium pseudolongum 1.0 1.0 1.0 1.0 0.0 1.0 0.0 5.0
Helicobacter canis 1.0 1.0 1.0 1.0 0.0 0.0 1.0 5.0
The biomarkers_analysis
folder contains the biomarkers inference across all taxonomy levels for all group-pairs. The ANCOM (Analysis of compositions of microbiomes) figures are in ANCOM_identification
.
For example,
(If image failed to show, click it to view.)
(If image failed to show, click it to view.)
For more annotation, please refer to OUTPOST report html.
Basically, the cause is the wrong installation of humann3.
-
google it. Try
metaphlan —install
. Then re-run snakemake. -
if 1. not work. check the error log, try to find this sentence
Expecting location /the/expection/location
;cd
to the location, examine any file truncation/loss. If there be, remove all files in the/the/expection/location
, re-runmetaphlan —install
. Then re-run snakemake.
Refer to this answer. I tried and succeeded.
3. Can't locate File/Slurp.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /home/yzz0191/anaconda3/envs/OUTPOST/bin/abricate line 9.
I occured this when submitting OUTPOST to Slurm system. The reason is ~/anaconda3/envs/OUTPOST/bin/abricate
, the abricate
was written in Perl, and the first line of abricate
is #!/usr/bin/env perl
. So, comment this line out, add a new head line #! /path/to/your/anaconda3/bin/perl
(modify /path/to/your !), will solve it.
conda env remove -n OUTPOST
to clean the failed OUTPOST environment. Then use OUTPOST_without_bioconductor.yml
to create a new OUTPOST environment.
run conda install -c biobakery humann=3.1.1
. This is because humann internal error. Need to update to newer version of humann. After the installation, the original chocophlan database will be out of use (you will get Please install the latest version of the database: v201901_v31
error). So run humann_databases --download chocophlan full /path/to/your/databases/ --update-config yes
to update your chocophlan database for humann. Then, since you just updated the humann, you need to update the nucleotide and protein database record in humann_config too so that huamann can find the databases. Run humann_config --update database_folders protein /path/to/your/uniref/database/
and humann_config --update database_folders protein /path/to/your/utility_mapping/database/
.
Reformat your assembly file. from >human63_000000000001 TTTCCTTCGATGAGTTCTATGCCGTATATAATAAAAAGCATTCCGCTATTGAACAGCGTCTCGCAGAAAAAGGATTGCCGGAACATCTGCTTCATCGTAAGGAACGCAGACAGGAAAAACTGAATCATCCTGCTGTAAAAACGACAAAGCCCCACAGAAAGAAGAAAAAGAAACAGGTGTTCGAGCCGCTCTTGGAACAGAATGATGATTTCTTCTTTATTGCTGGTTATACTTCTGGCGGTGCCCCTTATGGTGTCACATGGGAAGAAATGGGACTAGAGCCTTGGGAAGAACTTGTATAAATATTATTGCCATTGCCGATTGCCAAAAGCA
to
>human63_000000000001
TTTCCTTCGATGAGTTCTATGCCGTATATAATAAAAAGCA
TTCCGCTATTGAACAGCGTCTCGCAGAAAAAGGATTGCCG
GAACATCTGCTTCATCGTAAGGAACGCAGACAGGAAAAAC
TGAATCATCCTGCTGTAAAAACGACAAAGCCCCACAGAAA
GAAGAAAAAGAAACAGGTGTTCGAGCCGCTCTTGGAACAG
AATGATGATTTCTTCTTTATTGCTGGTTATACTTCTGGCG
GTGCCCCTTATGGTGTCACATGGGAAGAAATGGGACTAGA
GCCTTGGGAAGAACTTGTATAAATATTATTGCCATTGCCG
ATTGCCAAAAGCA
Humann analysis is very time/computation consuption. You can use the downsample_reads
parameters, or you can set skip_humann_init
if you already have the init resutls. Set skip_humann_init = True
in Snakefile_config.yml
, then OUTPOST will not run human_init rule, but to check the human results under folder name_of_the_assembly/metabolism_analysis/humann3/ori_results/
, so you need to put the humann output with suffix as genefamilies.tsv/pathabundance.tsv/pathcoverage.tsv under the folder.
OUTPOST offers skip_assembly_analysis
. You can skip the assembly analysis if you have a large assembly and a number of groups which will save a lot of time about MAG tables generation.
OUTPOST: A comprehensive analysis software for whole-metagenome shotgun sequencing incorporating group stratification