-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optional outputs extension #5651
optional outputs extension #5651
Conversation
6caf5be
to
db3aa96
Compare
**Sibling:** cylc/cylc-flow#5651 Document the changes outlined in https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html
db3aa96
to
e240b3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tangentially related.
We were already using restricted Python evaluation to run the host selection ranking expressions, so I generalised this interface to allow for new uses going forward. This logic in this file should be unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual logic of the change lies here, most of the other changes are validation.
if isinstance(point, str): | ||
# convenient for tests, use PointBase instances for efficiency | ||
# otherwise | ||
point = get_point(point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes it easier to pull out tasks in integration tests.
if incomplete: | ||
if itask.state.outputs.is_incomplete(): | ||
# Retain as incomplete. | ||
LOG.warning( | ||
f"[{itask}] did not complete required outputs:" | ||
f" {incomplete}" | ||
f" {itask.state.outputs.get_incomplete()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After these changes bool(get_incomplete()) != is_complete()
.
E.G. if completion = succeeded and (x or y)
and succeeded
and x
have been generated, get_incomplete()
will return y
.
|
||
class TestMessageSorting(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[unrelated] converted from unit test.
e240b3f
to
228eb0d
Compare
(That was quick!!) |
**Implements:** https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html **Closes:** cylc#5640 * Require at least one optional output to be generated in order for the task to be considered "completed" by default. * Add the ability for users to manually specify the completion expression. * Bring task expiry into the optional outputs model. Document in changelog.
228eb0d
to
ba42cc0
Compare
The functional bit of the code is shockingly simple, validation proved trickier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
(FYI, I'll apply cosmetic suggestions and retest once functional review is complete) |
(I'll look at this ASAP...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice code (as usual 😩 ). Not tested yet, I'll approve tomorrow if I fail to break it!
5f06cf7
to
088dcd9
Compare
Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com> Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
088dcd9
to
6a710c0
Compare
(fixed flake8 error from suggestion) |
Testing all good. One question occurred to me though. It should have done earlier when reviewing the proposal, sorry. Still, better late than never. I'll illustrate with the xy(z) case:
If that's the entire graph off of But what about this:
or this:
In these cases early shutdown is not on the cards, and we have said that
IMO this is overkill and confusing when we do not need to protect against early shutdown. So perhaps we should change tack slightly: if a task has only optional outputs in the graph, then by default at least one of them must be generated. [This applies whenever there is at least one required output used in the graph, and any number of optional outputs, but it's most obvious when there's a single optional output: currently on this branch, by default a single optional output is not optional!] |
Do this:
Do this:
|
That wouldn't work as
Making chains is a more common use case than breaking chains, we must prioritise the more likely use case at the expense of one line of configuration for the less likely. Note we've yet to encounter a use case for early shutdown, we should support it, but it shouldn't be the default, I don't think we should make the rules more complex and add extra implicit logic for this purpose. |
I agree (since the proposal discussions) - note I was not suggesting this above. I was suggesting we should take the optional output declaration seriously (by default) IF there are required outputs present to prevent early shutdown.
Yes, as I said above I was just talking about the default situation - which is what most users will encounter most of the time.
We know if the output has any graph children, and hence if a "chain" extends from it in the graph. So it would work, so long as "used in the graph" means it has children. Unfortunately under the agreed model the (default) single optional output case looks downright illogical:
By default this is exactly equivalent to:
i.e. the optional output is by default not optional at all. Does that not bother you? We could avoid this with a small tweak to the rule:
Which could be put like this (so it doesn't look like special-casing):
Note this is only the default that I'm worried about; but before we lock it in I think we should really try to avoid default behaviour that will seem illogical to users. [UPDATE]: my next comment below better articulates the problem and solution, once it occurred to me we really can't consider optional outputs alone for this purpose. |
A bit more. I'm still not convinced on the "chain breaking" argument, if early shut down is prevented by other (required) outputs.
Here there can be no early shutdown, thanks to the required output. But our rule arbitrarily allows all but one of the optional chains to be broken. If there are 26 optional chains and 1 required, we allow (by default) 25 optional chains to be broken. How is that to be justified exactly? It's not to prevent early shutdown, and it's not much of a chain breaking safety net! There's no particular reason to choose the number one as special, other than to prevent early shutdown - which is also prevented by required outputs. So ... I think there are two logically justifiable models:
The second option would be a proper default safety net, with no logical holes in it. |
This approach effectively shifts optional output logic into the "runtime" section completely, the question marks in the graph would become superfluous, only validation would keep them in sync with the completion expression so they could be removed completely. Not necessarily a bad approach, however, this would be a substantial change which would break all graph branching in existing Cylc 8 workflows.
It's the safest default which provides the safest behaviour in the most common scenarios. More complex requirements require more complex configuration, but the simple remains simple until you break it. |
Maybe not. We could keep them in the graph to signify that there should be a corresponding completion condition. Outputs with no
A safe breaking change though: the workflow would carry on as normal from the generated output; the task would just be marked incomplete. (And, it is already a breaking change without this).
My two suggestions are equally simple, but without the downsides:
Otherwise, use a completion condition. What's not simple about those? |
Would it be least confusing to users if we don't mandate at least 1 optional output be generated, instead leaving it up to them to write a completion condition if needed? An exception could be made for if |
Yes, that's exactly what I'm suggesting as my preferred approach. The other option (no-chains-broken-by-default) is more heavy handed, and not my preference, but needs to be considered. |
Note, Hillary's suggestion was "one or more utilised outputs must be generated", which isn't quite the same and does have it's own drawbacks e.g. for this graph IMO, the proposed approach is nicer (the scope is restricted to optional outputs, the behaviour more closely caters to the principle "switch" use cases for graph branching). The runtime only approach described above as "no-chains-broken-by-default" (i.e. if you want something to be optional you must explicitly configure it in the completion expression) has merit. We did consider going this way back at the beginning. The separation of |
Either way, this PR implements the accepted proposal, so I think review should continue whilst discussion goes on (in a proposal?) so we're in the place to release this when we've decided. |
Sorry, my reading of @MetRonnie's question was a bit off. Yes, my suggestion was "we don't mandate at least 1 optional output be generated" but also, as Oliver says, "one or more utilized outputs must be generated" - in order to prevent early shutdown due to inadvertent non-production of outputs.
That's not really a drawback because the rule is "at least one output (of any kind) must be generated [by default] to prevent early shutdown" and in this case there are no other outputs to prevent early shutdown. [UPDATE: I've now thought of a solution below that avoids the need to consider both optional and required outputs to get around this problem]. On this branch, the interpretation of this graph |
I think we had better make a decision on this one way or the other, very soon. It would be too disruptive to switch to that approach after wide adoption of Cylc 8. Pros:
Cons:
I only raised this for completeness, as one of the fully self-consistent approaches. IMO it is probably unnecessarily heavy handed, so my vote is NO. Possibly not a hard NO, but we need a quick decision. |
Assuming we do dump the heavy-handed solution from consideration, here's a tweak that I think could resolve the impasse. TLDR: Keep this branch as-is but add an "unrestricted optional output" notation for use when "one way or another" branching is not the intention. DETAILS:
We say that the above notation means "restricted optional outputs", with the restriction being:
The justification for this restriction is for safety reasons: optional outputs are OFTEN used to direct the flow one way or another, in which case no outputs at all means something must be wrong. So far so good - that is this branch 🎉 The problem is it forces you to write a completion condition for truly optional outputs, which should not be necessary, and in particular the default looks downright stupid for the single-optional output case where going "one way or another" is not even possible. So let's add a variant notation for unrestricted optional outputs:
This notation means: the scheduler does not care at all if Notes:
|
Blocked whilst the proposal is under re-review. |
The accepted proposal has now been rejected. The new proposal requires branched logic so can't be flattened out using the completion statement the way this PR implemented it. Will pull out the relevant bits of code from this PR into a new branch and re-raise. I'll wait for the |
Implements: https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html
Closes: #5640
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.