Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GATKSVPipelineSingleSample validate it in MiniWDL #154

Merged
merged 11 commits into from
Jun 15, 2021

Conversation

illusional
Copy link
Contributor

Fixes #153.

Make the GATKSVPipelineSingleSample pipeline validate in MiniWDL through a combination of:

  • Optional to non-optional
    • Eg: Output was required, but input outfile-name was optional.
    • Add select_first to coerce X? -> X
  • Remove duplicate inputs
  • Default disk_overhead_gb to 0: disk_overhead_gb = select_first([disk_overhead_gb, 0]), because if it's not supplied I guess it could be 0.

@epiercehoffman
Copy link
Collaborator

Thank you for bringing to our attention that users of MiniWDL may encounter these validation issues. We use womtool for validation and cromwell to run workflows, and it appears that those tools have different validation checks.

I believe that MiniWDL is not handling parameters with default values the same way as womtool/cromwell, based on the changes you made with regard to optional/required inputs. In the cases of defragment_max_dist, min_large_pesr_call_size_for_filtering, min_large_pesr_depth_overlap_fraction, and disk_overhead_gb, these inputs should be optional in the top-level WDL, as all of them have default values assigned at a task-level (in the input section of the task). Since they have default values, any WDL that calls these tasks is not required to provide a value for these parameters when running workflows with cromwell. We do not wish for users to have to provide inputs for parameters that already have default values, so these parameters should remain optional.

With regard to your other changes, thank you for bringing to our attention the duplicate inputs, the lack of newline at the end of Tasks0506.wdl, and the use of the optional input outfile_name in Tasks0506.ZcatCompressedFiles when the task should be using output_file_name or assigning outfile_name a default value to deal with the optional/required problem.

@illusional
Copy link
Contributor Author

Thanks for the reply @epiercehoffman!

I believe Cromwell / womtool is actually parsing this incorrectly with respect to the WDL spec. I've requested clarification here: openwdl/wdl#462.

To address your feedback, I've made the Optional -> Non optional map apply the default in the task, like:

task ... {
  input {
    Int? disk_overhead_gb = 10
  }
  disk_overhead_gb_ = select_first([disk_overhead_gb, 10])
  # ... use `disk_overhead_gb_`
}

This validates with womtool and MiniWDL.

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @illusional you bring up a valid point about the default input types. I made some comments in specific places but please incorporate them throughout where you have made similar changes as well.

wdl/DepthPreprocessing.wdl Outdated Show resolved Hide resolved
wdl/GATKSVPipelineSingleSample.wdl Show resolved Hide resolved
wdl/MakeBincovMatrix.wdl Outdated Show resolved Hide resolved
wdl/Module00a.wdl Outdated Show resolved Hide resolved
wdl/Module00c.wdl Outdated Show resolved Hide resolved
@illusional
Copy link
Contributor Author

illusional commented May 21, 2021

Thanks @mwalker174, I've addressed your comments. I've also reverted the select_first([...]) changes you suggested as it causes Cromwell to fail to validate (an error with Cromwell). I've seen this before, and this is a grammar issue with Cromwell, where I know valid workflows are failing. It's trying to access the attribute 3 on the object 0 (actually a float).

Failed to import 'SingleSampleFiltering.wdl' (reason 1 of 1): ERROR: Unexpected symbol (line 247, col 117) when parsing 'e'.

Expected identifier, got "3".

        | coverageBed -a stdin -b ~{raw_dels} | awk '$NF < ~{select_first([min_large_pesr_depth_overlap_fraction, 0.3])} {print $4}' > large_dels_without_raw_depth_support.list
                                                                                                                    ^

$e = $e <=> :dot :identifier -> MemberAccess( value=$0, member=$2 )

Also, I think it's worth adding a miniwdl check and womtool validate as a GH action, happy to put something together if you're interested. I'd do it in a separate PR if you are interested :)

@illusional
Copy link
Contributor Author

Hi @mwalker174 and @epiercehoffman, are there any blockers for this PR, or anything I need to get it ready for re-review?

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response - I still have a couple of suggestions here.

wdl/DepthPreprocessing.wdl Outdated Show resolved Hide resolved
wdl/DepthPreprocessing.wdl Outdated Show resolved Hide resolved
@illusional
Copy link
Contributor Author

Thanks @mwalker174, I've refactored the select_first to if defined(var) then var else "default", the key being the default being in the string due to the cromwell grammar bug.

Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @illusional for the updates and follow-through, looks good!

@mwalker174 mwalker174 merged commit 5440f12 into broadinstitute:master Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GATKSVPipelineSingleSample doesn't parse in MiniWDL
4 participants