Add basic error checking for input csv samplesheet #469

bkohrn · 2021-12-10T19:38:08Z

EDIT by @maxulysse to add some checkbox to track the progress:
we should have proper check to see if data is correct for step:

Some additional ideas from slack (edited by @FriederikeHanssen )

If --step variant_calling or --step_annotation , then --tools should not be empty
Bonus feature: check that the tools actual fit for the data (i.e. HAplotypecaller should cause a failure if only status tumor in samplesheet) #603

Is your feature request related to a problem? Please describe

It's frustrating to have the pipeline run through completely, only to tell me at the end that there are file name clashes based on me (as a new user of the pipeline) having set up my TSV file incorrectly.

Describe the solution you'd like

To help with this, it would be useful for the pipeline to do some basic error checking on the TSV file before starting any other steps; this means checking things such as: are values in columns where each row needs to be unique actually unique? Do the columns provided (at least) seem to match what should go into them based on the start point given (for example, does a file name look like a fastq / bam / bai file? Does the file actually exist?)? It would also be helpful if the pipeline supported a header row in the TSV file, as I suspect part of the reason I messed up was due to not having a header row to refer to.

Describe alternatives you've considered

Not sure if there would be any good alternatives to this; it's mostly a convenience feature to get the pipeline to throw an error before it runs all night and then needs to be restarted from scratch in the morning.

The text was updated successfully, but these errors were encountered:

FriederikeHanssen · 2021-12-12T10:22:19Z

Sounds like a good idea to me. I suppose warnings could be added here:

sarek/workflows/sarek.nf

Line 468 in 085f1df

//TODO since it is mandatory: error/warning if not present?

similar to this

sarek/workflows/sarek.nf

Line 527 in 085f1df

log.warn "Missing or unknown field in csv file header"

.

bkohrn added the enhancement New feature or request label Dec 10, 2021

bkohrn assigned maxulysse Dec 10, 2021

FriederikeHanssen added this to the 3.0 milestone May 11, 2022

FriederikeHanssen added the input validation label Jun 10, 2022

lassefolkersen assigned lassefolkersen and asp8200 Jun 17, 2022

FriederikeHanssen mentioned this issue Jun 17, 2022

Checking the sample sheet in DSL2 #235

Closed

maxulysse changed the title ~~[FEATURE] Add basic error checking for input TSV file based~~ Add basic error checking for input csv samplesheet Jun 20, 2022

maxulysse mentioned this issue Jun 21, 2022

Add checks for correct data type for params.step #599

Merged

11 tasks

FriederikeHanssen closed this as completed Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic error checking for input csv samplesheet #469

Add basic error checking for input csv samplesheet #469

bkohrn commented Dec 10, 2021 •

edited by FriederikeHanssen

Loading

FriederikeHanssen commented Dec 12, 2021

Add basic error checking for input csv samplesheet #469

Add basic error checking for input csv samplesheet #469

Comments

bkohrn commented Dec 10, 2021 • edited by FriederikeHanssen Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

FriederikeHanssen commented Dec 12, 2021

bkohrn commented Dec 10, 2021 •

edited by FriederikeHanssen

Loading