Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mmseqs2 taxonomy #343

Merged
merged 43 commits into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
c8bc518
Merge branch 'nf-core:master' into add_mmseqs2_taxonomy
Darcy220606 Feb 27, 2024
5b7b09b
add mmseqs modules
Darcy220606 Feb 27, 2024
6ce2f67
Add all parametrs necessary in config files
Darcy220606 Feb 28, 2024
d864eea
Add parametrs to schema version1
Darcy220606 Feb 28, 2024
1a6b480
update the schema
Darcy220606 Feb 29, 2024
0c56c1f
add the docs info
Darcy220606 Feb 29, 2024
c3a80ca
working draft
Darcy220606 Feb 29, 2024
6f2c076
adjust mmseqs/createtsv step
Darcy220606 Feb 29, 2024
781ae22
add the merging step - working locally
Darcy220606 Mar 3, 2024
71e0151
add nf-tests for the nf-core modules
Darcy220606 Mar 12, 2024
b7623ae
prettier
Darcy220606 Mar 12, 2024
035ea16
Update nextflow_schema.json
Darcy220606 Mar 12, 2024
641452a
Update nextflow_schema.json
Darcy220606 Mar 12, 2024
bf8536d
Update nextflow_schema.json
Darcy220606 Mar 12, 2024
940c31a
Update nextflow_schema.json
Darcy220606 Mar 12, 2024
4f3fe99
Update nextflow_schema.json
Darcy220606 Mar 13, 2024
5765fe1
Update nextflow_schema.json
Darcy220606 Mar 13, 2024
9d8c9f7
Merge branch 'dev' into add_mmseqs2_taxonomy
Darcy220606 Mar 14, 2024
34f7bd2
add versions in subworkflows
Darcy220606 Mar 14, 2024
2d1f135
update nextflow config latest dev
Darcy220606 Mar 15, 2024
1e7a0f0
fix linting
Darcy220606 Mar 15, 2024
8c15a38
Merge branch 'dev' of https://github.com/Darcy220606/funcscan into ad…
Darcy220606 Mar 19, 2024
3f5622f
changelo
Darcy220606 Mar 19, 2024
2a21d96
update CHNAGELOG
Darcy220606 Mar 19, 2024
4446bdb
skip marfinderplus and deeparg
Darcy220606 Mar 19, 2024
798c457
try SILVA
Darcy220606 Mar 19, 2024
24a6440
change memory in test
Darcy220606 Mar 19, 2024
eb276c8
update test.config
Darcy220606 Mar 19, 2024
dc9e859
increase the RAM for CI tests
Darcy220606 Mar 19, 2024
44c20f1
update teh RAM for CI test 9.0GB
Darcy220606 Mar 19, 2024
0bc085c
update to revieweres comments
Darcy220606 Mar 23, 2024
265e1c6
lint taxa.nf
Darcy220606 Mar 23, 2024
2bd8d56
add test_taxonomy in nextflow.config
Darcy220606 Mar 23, 2024
19119c9
add James suggestions
Darcy220606 Mar 25, 2024
d015252
prettier run
Darcy220606 Mar 25, 2024
117d7eb
add reviewers suggestions
Darcy220606 Mar 29, 2024
0c61e7f
fix params in arg and bgc nf
Darcy220606 Mar 29, 2024
4b7603e
Merge branch 'dev' of https://github.com/Darcy220606/funcscan into ad…
Darcy220606 Apr 2, 2024
df631e6
add last review suggestions
Darcy220606 Apr 2, 2024
67255e9
update usage.md from James
Darcy220606 Apr 2, 2024
3eb03f6
fix linting
Darcy220606 Apr 2, 2024
f014be2
update output.md
Darcy220606 Apr 2, 2024
d974f15
add reviewers suggestions
Darcy220606 Apr 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,34 @@ jobs:
- name: Run pipeline with test data (BGC workflow)
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_bgc,docker --outdir ./results ${{ matrix.parameters }} --bgc_skip_deepbgc

test_taxonomy:
name: Run pipeline with test data (AMP, ARG and BGC taxonomy workflows)
# Only run on push if this is the nf-core dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/funcscan') }}"
runs-on: ubuntu-latest
strategy:
matrix:
NXF_VER:
- "23.04.0"
- "latest-everything"
parameters:
- "--annotation_tool prodigal"
- "--annotation_tool prokka"
- "--annotation_tool bakta --annotation_bakta_db_downloadtype light"

steps:
- name: Check out pipeline code
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

- name: Disk space cleanup
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1

- name: Run pipeline with test data (AMP, ARG and BGC taxonomy workflows)
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_taxonomy,docker --outdir ./results ${{ matrix.parameters }}
6 changes: 3 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#324](https://github.com/nf-core/funcscan/pull/324) Removed separate DeepARG test profile because database download is now stable. (by @jasmezz)
- [#332](https://github.com/nf-core/funcscan/pull/332) & [#327](https://github.com/nf-core/funcscan/pull/327) Merged pipeline template of nf-core/tools version 2.12.1 (by @jfy133, @jasmezz)
- [#338](https://github.com/nf-core/funcscan/pull/338) Set `--meta` parameter to default for Bakta, with singlemode optional. (by @jasmezz)
- [#343](https://github.com/nf-core/funcscan/pull/343) Added contig taxonomic classification using [MMseqs2](https://github.com/soedinglab/MMseqs2/). (by @darcy220606)

### `Fixed`

- [#348](https://github.com/nf-core/funcscan/pull/348) Updated samplesheet for pipeline tests to 'samplesheet_reduced.csv' with smaller datasets to reduce resource consumption. Updated prodigal module to fix pigz issue. (by @darcy220606)

### `Dependencies`
- [#343](https://github.com/nf-core/funcscan/pull/343) Standardized the resulting workflow summary tables to always start with 'sample_id\tcontig_id\t..'. Reformatted the output of `hamronization/summarize` module. (by @darcy220606)
- [#348](https://github.com/nf-core/funcscan/pull/348) Updated samplesheet for pipeline tests to 'samplesheet_reduced.csv' with smaller datasets to reduce resource consumption. Updated prodigal module to fix pigz issue. Removed `tests/` from `.gitignore`. (by @darcy220606)

| Tool | Previous version | New version |
| ------------- | ---------------- | ----------- |
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,10 @@

> Alcock, B. P., Huynh, W., Chalil, R., Smith, K. W., Raphenya, A. R., Wlodarski, M. A., Edalatmand, A., Petkau, A., Syed, S. A., Tsang, K. K., Baker, S. J. C., Dave, M., McCarthy, M. C., Mukiri, K. M., Nasir, J. A., Golbon, B., Imtiaz, H., Jiang, X., Kaur, K., Kwong, M., Liang, Z. C., Niu, K. C., Shan, P., Yang, J. Y. J., Gray, K. L., Hoad, G. R., Jia, B., Bhando, T., Carfrae, L. A., Farha, M. A., French, S., Gordzevich, R., Rachwalski, K., Tu, M. M., Bordeleau, E., Dooley, D., Griffiths, E., Zubyk, H. L., Brown, E. D., Maguire, F., Beiko, R. G., Hsiao, W. W. L., Brinkman F. S. L., Van Domselaar, G., McArthur, A. G. (2023). CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic acids research, 51(D1):D690-D699. [DOI: 10.1093/nar/gkac920](https://doi.org/10.1093/nar/gkac920)

- [MMseqs2](https://doi.org/10.1093bioinformatics/btab184)

> Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J., Levy Karin, E. (2021). Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, 37(18),3029–3031. [DOI: 10.1093/bioinformatics/btab184](https://doi.org/10.1093/bioinformatics/btab184)

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,13 @@ The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify s

## Pipeline summary

1. Annotation of assembled prokaryotic contigs with [`Prodigal`](https://github.com/hyattpd/Prodigal), [`Pyrodigal`](https://github.com/althonos/pyrodigal), [`Prokka`](https://github.com/tseemann/prokka), or [`Bakta`](https://github.com/oschwengers/bakta)
2. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
3. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg)
4. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)
5. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
6. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)
1. Taxonomic classification of contigs of **prokaryotic origin** with [`MMseqs2`](https://github.com/soedinglab/MMseqs2)
2. Annotation of assembled prokaryotic contigs with [`Prodigal`](https://github.com/hyattpd/Prodigal), [`Pyrodigal`](https://github.com/althonos/pyrodigal), [`Prokka`](https://github.com/tseemann/prokka), or [`Bakta`](https://github.com/oschwengers/bakta)
3. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
4. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg)
5. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)
6. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
7. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)

![funcscan metro workflow](docs/images/funcscan_metro_workflow.png)

Expand Down
7 changes: 7 additions & 0 deletions bin/comBGC.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Jasmin Frangenberg and released under the MIT license.
# See below for full license text.

from Bio import SeqIO
import pandas as pd
import argparse
Expand Down Expand Up @@ -643,6 +646,10 @@ def gecco_workflow(gecco_paths):
inplace=True,
)

# Rearrange and rename the columns in the summary df
summary_all = summary_all.iloc[:, [0, 2, 1] + list(range(3, len(summary_all.columns)))]
summary_all.rename(columns={'Sample_ID':'sample_id', 'Contig_ID':'contig_id', 'CDS_ID':'BGC_region_contig_ids'}, inplace=True)

# Write results to TSV
if not os.path.exists(outdir):
os.makedirs(outdir)
Expand Down
231 changes: 231 additions & 0 deletions bin/merge_taxonomy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
#!/usr/bin/env python3

# Written by Anan Ibrahim and released under the MIT license.
# See git repository (https://github.com/Darcy220606/AMPcombi) for full license text.
# Date: March 2024
# Version: 0.1.0

# Required modules
import sys
import os
import pandas as pd
import numpy as np
import argparse

tool_version = "0.1.0"
#########################################
# TOP LEVEL: AMPCOMBI
Darcy220606 marked this conversation as resolved.
Show resolved Hide resolved
#########################################
parser = argparse.ArgumentParser(prog = 'merge_taxonomy', formatter_class=argparse.RawDescriptionHelpFormatter,
usage='%(prog)s [options]',
description=('''\
.............................................................................
*merge_taxonomy*
.............................................................................
This script merges all three funcscan workflows with
MMseqs2 taxonomy results. This is done in three submodules that can be
activated seperately.
.............................................................................'''),
epilog='''Thank you for running taxonomy_merge!''',
add_help=True)
parser.add_argument('--version', action='version', version='merge_taxonomy ' + tool_version)

#########################################
# SUBPARSERS
#########################################
subparsers = parser.add_subparsers(required=True)

#########################################
# SUBPARSER: AMPCOMBI
#########################################
ampcombi_parser = subparsers.add_parser('ampcombi_taxa')

ampcombi_parser.add_argument("--ampcombi", dest="amp", nargs='?', help="Enter the path to the ampcombi_complete_summary.tsv' \n (default: %(default)s)",
type=str, default='ampcombi_complete_summary.csv')
ampcombi_parser.add_argument("--taxonomy", dest="taxa1", nargs='+', help="Enter the list of taxonomy files for all samples. ")

#########################################
# SUBPARSER: COMBGC
#########################################
combgc_parser = subparsers.add_parser('combgc_taxa')

combgc_parser.add_argument("--combgc", dest="bgc", nargs='?', help="Enter the path to the combgc_complete_summary.tsv' \n (default: %(default)s)",
Darcy220606 marked this conversation as resolved.
Show resolved Hide resolved
type=str, default='combgc_complete_summary.csv')
combgc_parser.add_argument("--taxonomy", dest="taxa2", nargs='+', help="Enter the list of taxonomy files for all samples. ")

#########################################
# SUBPARSER: HAMRONIZATION
#########################################
hamronization_parser = subparsers.add_parser('hamronization_taxa')

hamronization_parser.add_argument("--hamronization", dest="arg", nargs='?', help="Enter the path to the hamronization_complete_summary.tsv' \n (default: %(default)s)",
Darcy220606 marked this conversation as resolved.
Show resolved Hide resolved
type=str, default='hamronization_complete_summary.csv')
hamronization_parser.add_argument("--taxonomy", dest="taxa3",nargs='+', help="Enter the list of taxonomy files for all samples. ")

#########################################
# TAXONOMY
#########################################
def reformat_mmseqs_taxonomy(mmseqs_taxonomy):
mmseqs2_df = pd.read_csv(mmseqs_taxonomy, sep='\t', header=None, names=['contig_id', 'taxid', 'rank_label', 'scientific_name', 'lineage', 'mmseqs_lineage_contig'])
# remove the lineage column
mmseqs2_df.drop('lineage', axis=1, inplace=True)
mmseqs2_df['mmseqs_lineage_contig'].unique()
# convert any classification that has Eukaryota/root to NaN as funcscan targets bacteria ONLY **
for i, row in mmseqs2_df.iterrows():
lineage = str(row['mmseqs_lineage_contig'])
if 'Eukaryota' in lineage or 'root' in lineage:
mmseqs2_df.at[i, 'mmseqs_lineage_contig'] = np.nan
# insert the sample name in the first column according to the file basename
file_basename = os.path.basename(mmseqs_taxonomy)
filename = os.path.splitext(file_basename)[0]
mmseqs2_df.insert(0, 'sample_id', filename)
return mmseqs2_df

#########################################
# FUNCTION: AMPCOMBI
#########################################
def ampcombi_taxa(args):
merged_df = pd.DataFrame()

# assign input args to variables
ampcombi = args.amp
taxa_list = args.taxa1

# prepare the taxonomy files
taxa_df = pd.DataFrame()
# append the dfs to the taxonomy_files_combined
for file in taxa_list: # list of taxa files ['','']
df = reformat_mmseqs_taxonomy(file)
taxa_df = pd.concat([taxa_df, df])

# filter the tool df
tool_df = pd.read_csv(ampcombi, sep=',') #current ampcombi version is comma sep. CHANGE WITH VERSION 0.2.0
# make sure 1st and 2nd column have the same column labels
tool_df.rename(columns={tool_df.columns[0]: 'sample_id'}, inplace=True)
tool_df.rename(columns={tool_df.columns[1]: 'contig_id'}, inplace=True)
# grab the real contig id in another column copy for merging
tool_df['contig_id_merge'] = tool_df['contig_id'].str.rsplit('_', 1).str[0]

# merge rows from taxa to ampcombi_df based on substring match in sample_id
# grab the unique sample names from the taxonomy table
samples_taxa = taxa_df['sample_id'].unique()
# for every sampleID in taxadf merge the results
for sampleID in samples_taxa:
# subset ampcombi
subset_tool = tool_df.loc[tool_df['sample_id'].str.contains(sampleID)]
# subset taxa
subset_taxa = taxa_df.loc[taxa_df['sample_id'].str.contains(sampleID)]
# merge
subset_df = pd.merge(subset_tool, subset_taxa, left_on = 'contig_id_merge', right_on='contig_id', how='left')
# cleanup the table
columnsremove = ['contig_id_merge','contig_id_y', 'sample_id_y']
subset_df.drop(columnsremove, axis=1, inplace=True)
subset_df.rename(columns={'contig_id_x': 'contig_id', 'sample_id_x':'sample_id'},inplace=True)
# append in the combined_df
merged_df = merged_df.append(subset_df, ignore_index=True)

# write to file
merged_df.to_csv('ampcombi_complete_summary_taxonomy.tsv', sep='\t', index=False)

#########################################
# FUNCTION: COMBGC
#########################################
def combgc_taxa(args):
merged_df = pd.DataFrame()

# assign input args to variables
combgc = args.bgc
taxa_list = args.taxa2

# prepare the taxonomy files
taxa_df = pd.DataFrame()
# append the dfs to the taxonomy_files_combined
for file in taxa_list: # list of taxa files ['','']
df = reformat_mmseqs_taxonomy(file)
taxa_df = pd.concat([taxa_df, df])

# filter the tool df
tool_df = pd.read_csv(combgc, sep='\t')
# make sure 1st and 2nd column have the same column labels
tool_df.rename(columns={tool_df.columns[0]: 'sample_id'}, inplace=True)
tool_df.rename(columns={tool_df.columns[1]: 'contig_id'}, inplace=True)

# merge rows from taxa to ampcombi_df based on substring match in sample_id
# grab the unique sample names from the taxonomy table
samples_taxa = taxa_df['sample_id'].unique()
# for every sampleID in taxadf merge the results
for sampleID in samples_taxa:
# subset ampcombi
subset_tool = tool_df.loc[tool_df['sample_id'].str.contains(sampleID)]
# subset taxa
subset_taxa = taxa_df.loc[taxa_df['sample_id'].str.contains(sampleID)]
# merge
subset_df = pd.merge(subset_tool, subset_taxa, left_on = 'contig_id', right_on='contig_id', how='left')
# cleanup the table
columnsremove = ['sample_id_y']
subset_df.drop(columnsremove, axis=1, inplace=True)
subset_df.rename(columns={'sample_id_x':'sample_id'},inplace=True)
# append in the combined_df
merged_df = merged_df.append(subset_df, ignore_index=True)

# write to file
merged_df.to_csv('combgc_complete_summary_taxonomy.tsv', sep='\t', index=False)

#########################################
# FUNCTION: HAMRONIZATION
#########################################
def hamronization_taxa(args):
merged_df = pd.DataFrame()

# assign input args to variables
hamronization = args.arg
taxa_list = args.taxa3

# prepare the taxonomy files
taxa_df = pd.DataFrame()
# append the dfs to the taxonomy_files_combined
for file in taxa_list: # list of taxa files ['','']
df = reformat_mmseqs_taxonomy(file)
taxa_df = pd.concat([taxa_df, df])

# filter the tool df
tool_df = pd.read_csv(hamronization, sep='\t')
# rename the columns
tool_df.rename(columns={'input_file_name':'sample_id', 'input_sequence_id':'contig_id'}, inplace=True)
# reorder the columns
new_order = ['sample_id', 'contig_id'] + [col for col in tool_df.columns if col not in ['sample_id', 'contig_id']]
tool_df = tool_df.reindex(columns=new_order)
# grab the real contig id in another column copy for merging
tool_df['contig_id_merge'] = tool_df['contig_id'].str.rsplit('_', 1).str[0]

# merge rows from taxa to ampcombi_df based on substring match in sample_id
# grab the unique sample names from the taxonomy table
samples_taxa = taxa_df['sample_id'].unique()
# for every sampleID in taxadf merge the results
for sampleID in samples_taxa:
# subset ampcombi
subset_tool = tool_df.loc[tool_df['sample_id'].str.contains(sampleID)]
# subset taxa
subset_taxa = taxa_df.loc[taxa_df['sample_id'].str.contains(sampleID)]
# merge
subset_df = pd.merge(subset_tool, subset_taxa, left_on = 'contig_id_merge', right_on='contig_id', how='left')
# cleanup the table
columnsremove = ['contig_id_merge','contig_id_y', 'sample_id_y']
subset_df.drop(columnsremove, axis=1, inplace=True)
subset_df.rename(columns={'contig_id_x': 'contig_id', 'sample_id_x':'sample_id'},inplace=True)
# append in the combined_df
merged_df = merged_df.append(subset_df, ignore_index=True)

# write to file
merged_df.to_csv('hamronization_complete_summary_taxonomy.tsv', sep='\t', index=False)

#########################################
# SUBPARSERS: DEFAULT
#########################################
ampcombi_parser.set_defaults(func=ampcombi_taxa)
combgc_parser.set_defaults(func=combgc_taxa)
hamronization_parser.set_defaults(func=hamronization_taxa)

if __name__ == '__main__':
args = parser.parse_args()
args.func(args) # call the default function
Loading
Loading