-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for supporting a dry-run like feature #1774
Comments
Sounds very interesting. I often build pipeline structures using toy files and For large pipelines it might be cumbersome to have |
Sounds cool!! I wonder if we can also use this sort of feature to do unit testing of individual modules/processes? e.g. if we are able to stage some minimal test data from a remote repo like nf-core-testdatasets or a path relative to the module. We can then maybe have some sort of checking mechanism via md5sums or number of lines in the file that would be relatively easy to implement in bash? |
Excellent!
That's how is expected to work when adding
That's slight different, the plan is to cover unit (task) testing with another feature that allows to check the actual task result. What about naming? not super convinced about |
As for naming, since it's almost Halloween, I think nothing beats
💀 👻 🧟 |
🤣 🤣 🤣
|
|
Not super convinced by the name either but think |
Could have |
|
|
|
So far quite like |
Like the proposal and |
I agree that Instead, this feature does launch the pipeline replacing the process commands with a user-provided dummy implement. I think the name should reflect this difference to avoid further confuse the users and also to stress that it can be used to quickly prototype a pipeline using temporary commands stub. I like to o the word a. Adding @PaulHancock who first inspired this feature the past year during the Nextflow workshop at Pawsay. |
Good point about the distinction between this feature and reasonable expectation of what Should that be As you mentioned the In other words isn't a |
Ooops yes Regarding the testing I see this more for quick run and prototyping, the plan for testing is to provide the ability to have self-contained tests for each task running the real command, that the most important to validate. |
Is this intended to be like a unit test framework with mocks? If so, I feel like genuine MagicMock-like functionality from Python would be more useful than a dry-run. Mock path output, mock stdout/stderr, mock values outputs, all in Groovy. Also mock a shell, and make sure I'm mixing unit testing and "end-to-end" testing here, but hopefully that makes sense. I'm noodling through an interface and style in my head, but my first question above is more important. I saw talk of a unit test framework, so I may be in the wrong thread. Edit: Oh, I see where you said this, now: "That's slight different, the plan is to cover unit (task) testing with another feature that allows to check the actual task result." Did you have any other questions, or want a more formal proposal, though? |
Yes please to We commented on the lack of a |
Discussing more on this it seems the consensus is for I've merged on master a first implementation and drafted the docs here. If you want to give it a try you can use this command
|
This is avail starting from version 20.11.0-edge |
When dealing with complex pipelines deployed across heterogeneous systems it's crucial beings able to quickly verify that all the components run as expected especially for cloud environments.
A classic way of managing this is having dry-run mechanism which simulates the run of the pipeline computing all the nodes (tasks) traversed in execution DAG.
However, this is not feasible in nextflow because by design tasks can contain partial output declarations e.g.
output: path('*.bam')
that captures all files produced with the extension.bam
.Therefore the expected task outputs cannot be determined without running itself.
This why the golden rule for nextflow pipelines is to include a minimal dataset that allows the complete execution of the pipeline locally and with a continuous integration system.
However, in some situation can be very difficult and even a small dataset could take too much storage and computing resources.
A possible alternative could be to add in the Nextflow process definition a command stub that can be used to mimic the expected outputs. For example
The
dryrun
section is ignored unless the user specifies the CLI options-dry-run
, in this case, when defined, it replaces the actual process script.This could implement a nice alternative to quickly test the main execution logic and deploy in the target platform without hitting the real data.
This mechanism could also be used to rapidly draft the main execution flow just providing the task stubs ie. fake commands ., and replace them once the main flow works as expected.
The text was updated successfully, but these errors were encountered: