Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samplesheet multilevel nesting ick4 #24

Merged
merged 23 commits into from
Jun 7, 2024

Conversation

jessicarowell
Copy link
Collaborator

@jessicarowell jessicarowell commented May 30, 2024

This branch adds an option to choose which files to create the samplesheet (and thus, run the pipeline) on. It defaults to only running the pipeline on fastq files nested in subdirectories, the default CDC Core format. But you can instead choose to run the pipeline on all the files (top-nested and one-dir-under nested) or only the top-level files.

Test cmd:
nextflow run main.nf --indir /scicomp/groups-pure/Projects/scbs_mpob/JRowell_ick4/mpox_testing/paired/ --file_levels <nested, all, top> --outdir /scicomp/groups-pure/Projects/scbs_mpob/JRowell_ick4/mpox_testing/results/ --project_name test -config /scicomp/reference/nextflow/configs/cdc-dev.config -profile singularity,rosalind

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/mpxvassembly branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jessicarowell jessicarowell added the enhancement New feature or request label May 30, 2024
@jessicarowell jessicarowell self-assigned this May 30, 2024
@mikeyweigand
Copy link
Collaborator

Please remove option --file_levels all because it utilizes inconsistent sample name derivation and reinforces poor data organization practices. Users should have option to define:
top -- look for fastq files in listed directory only
nested -- look for fastq files organized in subdirs one level down

in either scenario, ignore any other fastq files present in the directory tree passed to --indir.

@jessicarowell
Copy link
Collaborator Author

I have removed the --file_levels all option and updated the documentation to reflect this.

@mikeyweigand
Copy link
Collaborator

This needs some param validation because running with --file_levels all attempts to execute but returns warning:

NOTE: Process POLKAPOX_ASSEMBLY:POLKAPOX:CREATE_SAMPLESHEET terminated with an error exit status (2) -- Error is ignored

@jessicarowell
Copy link
Collaborator Author

I've enforced the valid values for --file_levels in nextflow

Copy link
Collaborator

@mikeyweigand mikeyweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@jessicarowell jessicarowell merged commit f3b83f8 into master Jun 7, 2024
1 of 9 checks passed
@kyleoconnell-CDC kyleoconnell-CDC deleted the samplesheet_multilevel_nesting_ick4 branch September 17, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Internal] [Bug] CREATE_SAMPLESHEET names the samplesheet null_samplesheet.csv if no project name given
2 participants