Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gunc overwriting of raw and checkm files #653

Merged
merged 1 commit into from
Aug 22, 2024
Merged

Conversation

jfy133
Copy link
Member

@jfy133 jfy133 commented Aug 21, 2024

This issue comes about from the unfortunate naming scheme of GUNC.

One other option would be to update the module to use --input_dir and give multiple files to get a single 'raw' file, however then this would make it harder for us to then join the same info with the output from CheckM (coming as per-bin files), thus I've gone the slightly ugly route of just making an additional level directory that is named after the input FASTA file, to ensure they are distinct.

Now looks like:

.
├── MEGAHIT-DASTool-unclassified-dastool_refined-test_minigut
│   ├── MEGAHIT-MetaBAT2Refined-test_minigut.1
│   │   └── GUNC.progenomes_2.1.maxCSS_level.tsv
│   └── MEGAHIT-MetaBAT2Refined-test_minigut.2
│       └── GUNC.progenomes_2.1.maxCSS_level.tsv
├── MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-test_minigut
│   └── MEGAHIT-DASToolUnbinned-test_minigut
│       └── GUNC.progenomes_2.1.maxCSS_level.tsv
├── SPAdes-DASTool-unclassified-dastool_refined-test_minigut
│   ├── SPAdes-MaxBin2Refined-test_minigut.001_sub
│   │   └── GUNC.progenomes_2.1.maxCSS_level.tsv
│   └── SPAdes-MetaBAT2Refined-test_minigut.2
│       └── GUNC.progenomes_2.1.maxCSS_level.tsv
└── SPAdes-DASTool-unclassified-dastool_refined_unbinned-test_minigut
    └── SPAdes-DASToolUnbinned-test_minigut
        └── GUNC.progenomes_2.1.maxCSS_level.tsv
        ```
        
Wheras before it was 

```bash
.
├── MEGAHIT-DASTool-unclassified-dastool_refined-test_minigut
│   │   └── GUNC.progenomes_2.1.maxCSS_level.tsv
├── MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-test_minigut
│       └── GUNC.progenomes_2.1.maxCSS_level.tsv
├── SPAdes-DASTool-unclassified-dastool_refined-test_minigut
│   │   └── GUNC.progenomes_2.1.maxCSS_level.tsv
└── SPAdes-DASTool-unclassified-dastool_refined_unbinned-test_minigut
        └── GUNC.progenomes_2.1.maxCSS_level.tsv

Where yo ucan see we are missing 2 of the files...

Note that the summary gunc_summary.tsv file was not affected by this issue, and displayed all 6 bins in the test data correctly :) it was purely the publishing that was the issue here.

Thanks to @zackhenny for reporting!

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jfy133 jfy133 linked an issue Aug 21, 2024 that may be closed by this pull request
Copy link

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 364b3ee

+| ✅ 307 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗   5 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file [TODO: try and test using for --host_fasta and --host_genome]
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-08-21 19:49:17

@jfy133
Copy link
Member Author

jfy133 commented Aug 22, 2024

Thanks @CarsonJM !

@jfy133 jfy133 merged commit dd7ae6c into dev Aug 22, 2024
15 checks passed
@jfy133 jfy133 deleted the fix-gunc-overwriting branch August 22, 2024 12:12
@jfy133 jfy133 mentioned this pull request Aug 22, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GUNC-GTDB overwriting output files for bins
2 participants