Add domain-level classification for bins #395

prototaxites · 2023-02-23T17:11:23Z

edit: ready to go now!

This one is still a work-in-progress, but I'm putting this up now as it passes -profile test, with all bins from the minigut being classified as prokarya, as expected. If anyone has a dataset which produces eukaryotic MAGs, I'd be keen to hear if it works for you.

Aside from adding the domain classification subworkflow, I've split off the coverage/MAG_DEPTHS processes to a new subworkflow, which allows for a set of final bins to be determined (following refinement, classification, etc.) before calculating coverage.

PR checklist

Add summary, fix classifier bug

github-actions · 2023-02-24T15:06:36Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 7172909

+| ✅ 158 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-mag_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-mag_logo_light.png
files_exist - File found: docs/images/nf-core-mag_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreSchema.groovy
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowMag.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-mag_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.show_hidden_params
nextflow_config - Config variable found: params.schema_ignore_params
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: '2.3.1dev'
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-mag_logo_light.png matches the template
files_unchanged - docs/images/nf-core-mag_logo_light.png matches the template
files_unchanged - docs/images/nf-core-mag_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreSchema.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 22.10.1, Config: 22.10.1
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (218 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.8
Run at 2023-06-09 10:14:09

prototaxites · 2023-03-06T12:04:19Z

@nf-core-bot fix linting

prototaxites · 2023-03-06T12:43:40Z

Created a PR for test data with eukaryotic reads to the mag branch of the test-datasets repo: nf-core/test-datasets#782

conf/test.config

conf/test_domain_classification.config

prototaxites · 2023-03-10T13:44:07Z

The binrefinement test seems to be failing because it's running out of space:

  docker: failed to register layer: ApplyLayer exit status 1 stdout:  stderr: write /usr/local/lib/R/library/methods/help/methods.rdb: no space left on device.

I don't think the outputs of the test should have changed based on the changes in the PR - anyone have any thoughts on what's up?

d4straub · 2023-03-10T14:44:19Z

As far as I know there is only a specific amount of storage space available on github tests, maybe that was now exceeded. I just restarted the failed test, maybe it was just a temporal hiccup.

prototaxites · 2023-03-10T16:58:10Z

No luck! Looking at the binrefinement config, I notice that the busco_clean parameter isn't set. A BUSCO run can take up quite a lot of space, and with CONCOCT adding ~129 bins (many of them single contig), this might be mean the test is using a lot more storage than is necessary, resulting in failure.

It might be worth enabling it within the test config? (Though it doesn’t fix the problem if this was passing before…)

prototaxites · 2023-03-31T13:39:26Z

Found the problem, now that I have some time (and Gitpod credits) to check.

There's an error in the BUSCO module, where the check for the busco_clean parameter merely checks if the variable exists, rather than checking if it is equal to "Y":

https://github.com/nf-core/mag/blob/master/modules/local/busco.nf:

    # if needed delete temporary BUSCO files
    if [ ${busco_clean} ]; then
        find . -depth -type d -name "augustus_config" -execdir rm -rf "{}" \\;
        find . -depth -type d -name "auto_lineage" -execdir rm -rf "{}" \\;
        find . -depth -type d -name "run_*" -execdir rm -rf "{}" +
    fi

I remember spotting the bug when making this PR, and in this branch, it is fixed:

https://github.com/prototaxites/mag/blob/euk_classify/modules/local/busco.nf

    # if needed delete temporary BUSCO files
    if [ ${busco_clean} = "Y" ]; then
        find . -depth -type d -name "augustus_config" -execdir rm -rf "{}" \\;
        find . -depth -type d -name "auto_lineage" -execdir rm -rf "{}" \\;
        find . -depth -type d -name "run_*" -execdir rm -rf "{}" +
    fi

I'll put a wee PR in to fix the BUSCO module, and add the busco_clean parameter to the test_binrefinement config, as the large number of bins in this test following the addition of CONCOCT means that the test should be failing (but isn't) due to the accidental deletion of temporary BUSCO files.

Dev

prototaxites · 2023-04-12T10:13:59Z

It works! 😅 Reviews welcome 😊

d4straub

Nice work!
The PR does make quite some changes, also in files I am not particular familiar with (the remodeled binning), so I'd appreciate another opinion, but it seems fine to me.

subworkflows/local/depths.nf

workflows/mag.nf

conf/modules.config

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>

prototaxites · 2023-05-04T11:05:53Z

Does anyone else have any comments? I'd really like to work on adding a eukaryotic annotation step to the pipeline, but ideally this would build on top of this PR. Otherwise, I could submit a separate PR which adds this to run on all bins regardless of domain, if that is preferable?

jfy133

Overall really nice, and some good optimisations - thanks @prototaxites !

Mostly minor changes and questions:

As you're adding enw functioanlity, you need to update the workflow diagram SVG!
Do you need to add ${meta.domains} to more of the prefixes in the post-domain classification steps in modules.conf, in cases once it's turned on (basically for any downstream step that consumes ch_input_for_postbinning_bins_* on input? (may not be necssary, but I want to double check).

bin/domain_classification.R

conf/modules.config

docs/output.md

workflows/mag.nf

subworkflows/local/domain_classification.nf

workflows/mag.nf

nextflow.config

prototaxites · 2023-05-09T12:53:15Z

Thanks for the comments all! I'll work on the pipeline diagram and probably shoot a draft in Slack before committing.

Do you need to add ${meta.domains} to more of the prefixes in the post-domain classification steps in modules.conf, in cases once it's turned on (basically for any downstream step that consumes ch_input_for_postbinning_bins_* on input? (may not be necssary, but I want to double check).

I don't think this is necessary, as each bin can only have one classification, and goes into each process once - better to just filter on it internally IMO. When it's turned off, it should be set to 'unclassified' internally, which would also be ugly to print out.

prototaxites · 2023-05-09T13:01:16Z

Also, a heads up that the new test_domain_classification.config test depends on the following PR in the test datasets repo:

nf-core/test-datasets#782

(should I enable the test in CI here: https://github.com/nf-core/mag/blob/master/.github/workflows/ci.yml ?)

prototaxites · 2023-06-05T15:17:15Z

Do you need to add ${meta.domains} to more of the prefixes in the post-domain classification steps in modules.conf, in cases once it's turned on (basically for any downstream step that consumes ch_input_for_postbinning_bins_* on input? (may not be necssary, but I want to double check).

I was thinking about this again and I realised there was the potential for input filename collisions due to this, due to the assumption that 'unknown' bins could be either eukaryotic or prokaryotic, and using these bins in both 'halves' of the post-processing - in the specific case of using DAS Tool and keeping all bins (pre- and post-refinement), those unknown bins would collide. I've fixed this by removing this assumption and sending unknowns only down the 'prokaryote' path. Each bin should now only be represented once in each post-binning step.

Do you have any other suggestions, @jfy133? Merging is currently blocked pending requested changes!

jfy133 · 2023-06-07T05:51:22Z

As far as I can tell LGTM !

…all number of test configs

prototaxites and others added 10 commits February 17, 2023 19:38

Add Tiara and domain classification

3d30140

Update domain classification and add versions

962eb85

Update to nf-core module, add to binning workflow

4935b46

Revert mag.nf to dev

5aca1ad

revert mag depths

50afa3a

proks only in prok steps, mag depths separated

fc7c99f

Add new params to schema

e4c99cc

Add to changelog, add output docs

33da36f

Add summary, fix classifier bug

Fix tiara summary

2b2371b

Merge branch 'dev' into euk_classify

dfebd44

prototaxites and others added 7 commits February 24, 2023 15:13

Fix binning refinement subworkflow

80c39e7

Update nextflow_schema.json

448503a

Fix checkm

2868cfd

One last bug fix

dfdf092

Add classification of unbins, tidy workflow

f967bfe

Update Tiara module, small bugfixes

5de01de

Merge branch 'dev' into euk_classify

c454497

nf-core-bot and others added 2 commits March 6, 2023 12:13

[automated] Fix linting with Prettier

4432cea

Format changes to split_fasta.py with Black

61da696

prototaxites marked this pull request as ready for review March 6, 2023 12:44

d4straub reviewed Mar 6, 2023

View reviewed changes

conf/test.config Outdated Show resolved Hide resolved

d4straub reviewed Mar 6, 2023

View reviewed changes

conf/test_domain_classification.config Outdated Show resolved Hide resolved

Fix test profiles

6ad8ec2

Merge branch 'dev' into euk_classify

abb454c

d4straub mentioned this pull request Apr 11, 2023

Fix busco_clean parameter being enabled by default #419

Merged

7 tasks

Rebase to current dev - fix busco_clean

8277006

Dev

d4straub reviewed Apr 12, 2023

View reviewed changes

subworkflows/local/depths.nf Outdated Show resolved Hide resolved

workflows/mag.nf Outdated Show resolved Hide resolved

workflows/mag.nf Show resolved Hide resolved

workflows/mag.nf Show resolved Hide resolved

conf/modules.config Outdated Show resolved Hide resolved

prototaxites and others added 4 commits April 12, 2023 15:46

Apply suggestions from code review

784efce

Co-authored-by: Daniel Straub <42973691+d4straub@users.noreply.github.com>

Merge branch 'dev' into euk_classify

a7e2850

Fix error in modules.json

7e93cb6

Fix prettier on modules.json

f1d2894

jfy133 requested changes May 8, 2023

View reviewed changes

Apply suggestions from code review

064bdf1

Fix nextflow_schema.json

b63854f

Update workflow diagram

c668f60

jfy133 added this to the 2.4.0 milestone May 26, 2023

Remove assumption unknown bins are putatively eukaryotic

5201b7b

jfy133 approved these changes Jun 7, 2023

View reviewed changes

prototaxites and others added 6 commits June 7, 2023 12:36

Add test to nextflow.config, remove unneccesary if() blocks

c4dc47d

Merge branch 'dev' into euk_classify

88f4471

Fix bug (duplicate unbins)!

7592cf1

Merge branch 'dev' into euk_classify

e900c9c

Move domain classification test to adapterremoval test to reduce over…

043750b

…all number of test configs

Update nextflow.config

7172909

prototaxites merged commit 7c6a62a into nf-core:dev Jun 9, 2023

prototaxites deleted the euk_classify branch June 9, 2023 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add domain-level classification for bins #395

Add domain-level classification for bins #395

prototaxites commented Feb 23, 2023 •

edited

Loading

github-actions bot commented Feb 24, 2023 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

prototaxites commented Mar 6, 2023

prototaxites commented Mar 6, 2023

prototaxites commented Mar 10, 2023

d4straub commented Mar 10, 2023

prototaxites commented Mar 10, 2023 •

edited

Loading

prototaxites commented Mar 31, 2023

prototaxites commented Apr 12, 2023

d4straub left a comment

prototaxites commented May 4, 2023

jfy133 left a comment

prototaxites commented May 9, 2023

prototaxites commented May 9, 2023 •

edited

Loading

prototaxites commented Jun 5, 2023

jfy133 commented Jun 7, 2023

Add domain-level classification for bins #395

Add domain-level classification for bins #395

Conversation

prototaxites commented Feb 23, 2023 • edited Loading

PR checklist

github-actions bot commented Feb 24, 2023 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

prototaxites commented Mar 6, 2023

prototaxites commented Mar 6, 2023

prototaxites commented Mar 10, 2023

d4straub commented Mar 10, 2023

prototaxites commented Mar 10, 2023 • edited Loading

prototaxites commented Mar 31, 2023

prototaxites commented Apr 12, 2023

d4straub left a comment

Choose a reason for hiding this comment

prototaxites commented May 4, 2023

jfy133 left a comment

Choose a reason for hiding this comment

prototaxites commented May 9, 2023

prototaxites commented May 9, 2023 • edited Loading

prototaxites commented Jun 5, 2023

jfy133 commented Jun 7, 2023

prototaxites commented Feb 23, 2023 •

edited

Loading

github-actions bot commented Feb 24, 2023 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

prototaxites commented Mar 10, 2023 •

edited

Loading

prototaxites commented May 9, 2023 •

edited

Loading