Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic error checking for input csv samplesheet #469

Closed
16 tasks done
bkohrn opened this issue Dec 10, 2021 · 1 comment
Closed
16 tasks done

Add basic error checking for input csv samplesheet #469

bkohrn opened this issue Dec 10, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request input validation
Milestone

Comments

@bkohrn
Copy link

bkohrn commented Dec 10, 2021

EDIT by @maxulysse to add some checkbox to track the progress:
we should have proper check to see if data is correct for step:

  • nextflow run . -profile test,docker --step mapping
    • other input data should fail
    • proper warning
  • nextflow run . -profile test,docker --step markduplicates
    • fails
    • proper warning
  • nextflow run . -profile test,docker --step prepare_recalibration
    • fails
    • proper warning
  • nextflow run . -profile test,docker --step recalibrate
    • fails
    • proper warning
  • nextflow run . -profile test,docker --step variant_calling
    • fails
    • proper warning
  • nextflow run . -profile test,docker --step annotate
    • fails
    • proper warning

Some additional ideas from slack (edited by @FriederikeHanssen )

Is your feature request related to a problem? Please describe

It's frustrating to have the pipeline run through completely, only to tell me at the end that there are file name clashes based on me (as a new user of the pipeline) having set up my TSV file incorrectly.

Describe the solution you'd like

To help with this, it would be useful for the pipeline to do some basic error checking on the TSV file before starting any other steps; this means checking things such as: are values in columns where each row needs to be unique actually unique? Do the columns provided (at least) seem to match what should go into them based on the start point given (for example, does a file name look like a fastq / bam / bai file? Does the file actually exist?)? It would also be helpful if the pipeline supported a header row in the TSV file, as I suspect part of the reason I messed up was due to not having a header row to refer to.

Describe alternatives you've considered

Not sure if there would be any good alternatives to this; it's mostly a convenience feature to get the pipeline to throw an error before it runs all night and then needs to be restarted from scratch in the morning.

@bkohrn bkohrn added the enhancement New feature or request label Dec 10, 2021
@FriederikeHanssen
Copy link
Contributor

Sounds like a good idea to me. I suppose warnings could be added here:

//TODO since it is mandatory: error/warning if not present?
similar to this
log.warn "Missing or unknown field in csv file header"
.

@FriederikeHanssen FriederikeHanssen added this to the 3.0 milestone May 11, 2022
@maxulysse maxulysse changed the title [FEATURE] Add basic error checking for input TSV file based Add basic error checking for input csv samplesheet Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request input validation
Projects
None yet
Development

No branches or pull requests

5 participants