Proposal for supporting a dry-run like feature #1774

pditommaso · 2020-10-26T11:41:46Z

When dealing with complex pipelines deployed across heterogeneous systems it's crucial beings able to quickly verify that all the components run as expected especially for cloud environments.

A classic way of managing this is having dry-run mechanism which simulates the run of the pipeline computing all the nodes (tasks) traversed in execution DAG.

However, this is not feasible in nextflow because by design tasks can contain partial output declarations e.g. output: path('*.bam') that captures all files produced with the extension .bam.

Therefore the expected task outputs cannot be determined without running itself.

This why the golden rule for nextflow pipelines is to include a minimal dataset that allows the complete execution of the pipeline locally and with a continuous integration system.

However, in some situation can be very difficult and even a small dataset could take too much storage and computing resources.

A possible alternative could be to add in the Nextflow process definition a command stub that can be used to mimic the expected outputs. For example

process foo {
  input: 
    path some_data
  output: 
   path '*.bam'
   
  dryrun: 
    """
    # check required tools are avail 
    which some_command || { echo "Missing required tools!"; exit 1; }

    # check the input file exists
    [ -f  $some_data ] || { echo "Missing required data!"; exit 1; }

    # create fake bam file
    echo "foo"   > gen1.bam 
    echo "bar"   > gen2.bam 
    """

  script: 
    """
    some_command --in $some_data --out gen1.bam gen2.bam
    """
}

The dryrun section is ignored unless the user specifies the CLI options -dry-run, in this case, when defined, it replaces the actual process script.

This could implement a nice alternative to quickly test the main execution logic and deploy in the target platform without hitting the real data.

This mechanism could also be used to rapidly draft the main execution flow just providing the task stubs ie. fake commands ., and replace them once the main flow works as expected.

The text was updated successfully, but these errors were encountered:

micans · 2020-10-26T13:08:45Z

Sounds very interesting. I often build pipeline structures using toy files and touch and split et cetera, so this would, for me, nicely integrate building a pipeline with testing it.

For large pipelines it might be cumbersome to have dryrun: for all processes. I'm thinking of a mode where the normal process is used by default and the dry run process only if it is present, but can't work out if this could actually work / be useful. I guess one should not write large pipelines 😸

drpatelh · 2020-10-26T13:18:51Z

Sounds cool!! I wonder if we can also use this sort of feature to do unit testing of individual modules/processes? e.g. if we are able to stage some minimal test data from a remote repo like nf-core-testdatasets or a path relative to the module. We can then maybe have some sort of checking mechanism via md5sums or number of lines in the file that would be relatively easy to implement in bash?

pditommaso · 2020-10-26T13:37:50Z

Excellent!

I'm thinking of a mode where the normal process is used by default and the dry run process only if it is present

That's how is expected to work when adding -dry-run CLI option.

I wonder if we can also use this sort of feature to do unit testing of individual modules/processes?

That's slight different, the plan is to cover unit (task) testing with another feature that allows to check the actual task result.

What about naming? not super convinced about dryrun: keyword.

micans · 2020-10-26T13:46:00Z

As for naming, since it's almost Halloween, I think nothing beats

skeleton:

💀 👻 🧟

pditommaso · 2020-10-26T13:47:55Z

🤣 🤣 🤣

proto: ?

drpatelh · 2020-10-26T13:52:02Z

rehearsal:, rehearse:, practice:, prototype:

drpatelh · 2020-10-26T13:53:11Z

Not super convinced by the name either but think dryrun will be the most obvious name for most people though.

drpatelh · 2020-10-26T13:58:02Z

Could have drytest: and unittest: (for the unit testing feature)?

pditommaso · 2020-10-26T13:59:54Z

mock:, stub:

drpatelh · 2020-10-26T14:03:31Z

trial:, tryout:

drpatelh · 2020-10-26T14:07:27Z

assay:, evaluate:, appraise:, practice:, pilot:, dummy: (although the latter may offend people nowadays...)

micans · 2020-10-26T14:41:00Z

So far quite like proto and stub, favourite is stub; it's very descriptive of what the section is.

rsuchecki · 2020-10-26T22:24:24Z

Like the proposal and stub sounds great to me. Having said that, dryrun combined with -dry-run CLI option would minimize the cognitive load for new starters already juggling other keywords, directives and operators.

pditommaso · 2020-11-01T16:26:27Z

I agree that dryrun would sound more friendly for the average user, however, a pure dryrun feature would compute which tasks to run without executing them.

Instead, this feature does launch the pipeline replacing the process commands with a user-provided dummy implement. I think the name should reflect this difference to avoid further confuse the users and also to stress that it can be used to quickly prototype a pipeline using temporary commands stub.

I like to o the word stub I feel too technical. So far the best choices are:

a. tryout:
b. ~~trail:~~ trial:
c. pilot:
d. proto:

Adding @PaulHancock who first inspired this feature the past year during the Nextflow workshop at Pawsay.

rsuchecki · 2020-11-03T05:52:14Z

Good point about the distinction between this feature and reasonable expectation of what dryrun might be @pditommaso

Should that be trial not trail? That'd be mi pick I think, but either could work. Just implement both 😜

As you mentioned the test feature earlier, I wonder if it wouldn't suffice to just implement that. After all, if I define a low-bar test for a process e.g. such that it outputs any file, then it effectively is my stub/trial and allows a dryrun? The upside would be that a developer by implementing even the simplest stub/trial is one step closer to implementing more substantial unit tests...

In other words isn't a stub/trial just a very basic test?

pditommaso · 2020-11-03T07:47:16Z

Ooops yes trial not trail :D

Regarding the testing I see this more for quick run and prototyping, the plan for testing is to provide the ability to have self-contained tests for each task running the real command, that the most important to validate.

mmatthews06 · 2020-11-03T17:44:17Z

Is this intended to be like a unit test framework with mocks?

If so, I feel like genuine MagicMock-like functionality from Python would be more useful than a dry-run. Mock path output, mock stdout/stderr, mock values outputs, all in Groovy. Also mock a shell, and make sure shell.calledOnceWith("bwa mem -R ${inputs} ..."), or something similar. Also E2E test that shell.return_value = "sample.bam" is passed to the next process properly.

I'm mixing unit testing and "end-to-end" testing here, but hopefully that makes sense.

I'm noodling through an interface and style in my head, but my first question above is more important. I saw talk of a unit test framework, so I may be in the wrong thread.

Edit: Oh, I see where you said this, now: "That's slight different, the plan is to cover unit (task) testing with another feature that allows to check the actual task result." Did you have any other questions, or want a more formal proposal, though?

ewallace · 2020-11-07T22:08:14Z

Yes please to dryrun or trial.

We commented on the lack of a dry-run feature in a recent tutorial paper on choosing pipeline frameworks, as a killer feature that nextflow currently lacks.

pditommaso · 2020-11-09T14:50:27Z

Discussing more on this it seems the consensus is for stub: block definition and -stub-run command-line option.

I've merged on master a first implementation and drafted the docs here.

If you want to give it a try you can use this command

NXF_VER=20.11.0-SNAPSHOT nextflow run [your script] [-stub-run]

pditommaso · 2021-02-10T21:32:40Z

This is avail starting from version 20.11.0-edge

mikej888 mentioned this issue Nov 9, 2020

Update workflow to use Nextflow dry-run feature when available riboviz/riboviz#233

Open

pditommaso added a commit that referenced this issue Nov 9, 2020

Add stub-run feature #1774

3de45c2

pditommaso added a commit that referenced this issue Nov 9, 2020

Improve docs #1774 [ci skip]

222e58f

pditommaso added this to the v21.01.0 milestone Nov 9, 2020

pditommaso closed this as completed Nov 11, 2020

pditommaso reopened this Nov 11, 2020

pditommaso mentioned this issue Nov 23, 2020

"dry run" option? #844

Open

pditommaso closed this as completed Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for supporting a dry-run like feature #1774

Proposal for supporting a dry-run like feature #1774

pditommaso commented Oct 26, 2020 •

edited

Loading

micans commented Oct 26, 2020

drpatelh commented Oct 26, 2020

pditommaso commented Oct 26, 2020

micans commented Oct 26, 2020

pditommaso commented Oct 26, 2020

drpatelh commented Oct 26, 2020 •

edited

Loading

drpatelh commented Oct 26, 2020

drpatelh commented Oct 26, 2020

pditommaso commented Oct 26, 2020

drpatelh commented Oct 26, 2020

drpatelh commented Oct 26, 2020 •

edited

Loading

micans commented Oct 26, 2020

rsuchecki commented Oct 26, 2020

pditommaso commented Nov 1, 2020 •

edited

Loading

rsuchecki commented Nov 3, 2020

pditommaso commented Nov 3, 2020

mmatthews06 commented Nov 3, 2020 •

edited

Loading

ewallace commented Nov 7, 2020

pditommaso commented Nov 9, 2020

pditommaso commented Feb 10, 2021

Proposal for supporting a dry-run like feature #1774

Proposal for supporting a dry-run like feature #1774

Comments

pditommaso commented Oct 26, 2020 • edited Loading

micans commented Oct 26, 2020

drpatelh commented Oct 26, 2020

pditommaso commented Oct 26, 2020

micans commented Oct 26, 2020

pditommaso commented Oct 26, 2020

drpatelh commented Oct 26, 2020 • edited Loading

drpatelh commented Oct 26, 2020

drpatelh commented Oct 26, 2020

pditommaso commented Oct 26, 2020

drpatelh commented Oct 26, 2020

drpatelh commented Oct 26, 2020 • edited Loading

micans commented Oct 26, 2020

rsuchecki commented Oct 26, 2020

pditommaso commented Nov 1, 2020 • edited Loading

rsuchecki commented Nov 3, 2020

pditommaso commented Nov 3, 2020

mmatthews06 commented Nov 3, 2020 • edited Loading

ewallace commented Nov 7, 2020

pditommaso commented Nov 9, 2020

pditommaso commented Feb 10, 2021

pditommaso commented Oct 26, 2020 •

edited

Loading

drpatelh commented Oct 26, 2020 •

edited

Loading

drpatelh commented Oct 26, 2020 •

edited

Loading

pditommaso commented Nov 1, 2020 •

edited

Loading

mmatthews06 commented Nov 3, 2020 •

edited

Loading