Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optional outputs extension #5651

Closed

Conversation

oliver-sanders
Copy link
Member

@oliver-sanders oliver-sanders commented Jul 27, 2023

Implements: https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html
Closes: #5640

  • Require at least one optional output to be generated in order for the task to be considered "completed" by default.
  • Add the ability for users to manually specify the completion expression.
  • Bring task expiry into the optional outputs model. Document in changelog.
  • Bolster optional output testing.
  • Generalise the interface for defining restricted Python evaluators.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at Optional output extension cylc-doc#634.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@oliver-sanders oliver-sanders added this to the cylc-8.3.0 milestone Jul 27, 2023
@oliver-sanders oliver-sanders self-assigned this Jul 27, 2023
@oliver-sanders oliver-sanders force-pushed the optional-outputs-proposal branch 2 times, most recently from 6caf5be to db3aa96 Compare July 28, 2023 15:12
oliver-sanders added a commit to oliver-sanders/cylc-doc that referenced this pull request Jul 31, 2023
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially related.

We were already using restricted Python evaluation to run the host selection ranking expressions, so I generalised this interface to allow for new uses going forward. This logic in this file should be unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual logic of the change lies here, most of the other changes are validation.

Comment on lines +821 to +824
if isinstance(point, str):
# convenient for tests, use PointBase instances for efficiency
# otherwise
point = get_point(point)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes it easier to pull out tasks in integration tests.

Comment on lines -1378 to +1391
if incomplete:
if itask.state.outputs.is_incomplete():
# Retain as incomplete.
LOG.warning(
f"[{itask}] did not complete required outputs:"
f" {incomplete}"
f" {itask.state.outputs.get_incomplete()}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After these changes bool(get_incomplete()) != is_complete().

E.G. if completion = succeeded and (x or y) and succeeded and x have been generated, get_incomplete() will return y.


class TestMessageSorting(unittest.TestCase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[unrelated] converted from unit test.

@oliver-sanders oliver-sanders marked this pull request as ready for review July 31, 2023 17:00
@hjoliver
Copy link
Member

hjoliver commented Aug 1, 2023

(That was quick!!)

**Implements:** https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html
**Closes:** cylc#5640

* Require at least one optional output to be generated in order for
  the task to be considered "completed" by default.
* Add the ability for users to manually specify the completion
  expression.
* Bring task expiry into the optional outputs model.
  Document in changelog.
@oliver-sanders
Copy link
Member Author

The functional bit of the code is shockingly simple, validation proved trickier.

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/config.py Show resolved Hide resolved
cylc/flow/task_outputs.py Outdated Show resolved Hide resolved
cylc/flow/task_outputs.py Show resolved Hide resolved
tests/integration/test_config.py Show resolved Hide resolved
tests/integration/test_optional_outputs.py Outdated Show resolved Hide resolved
tests/integration/test_optional_outputs.py Outdated Show resolved Hide resolved
tests/integration/test_optional_outputs.py Show resolved Hide resolved
@oliver-sanders oliver-sanders linked an issue Aug 10, 2023 that may be closed by this pull request
@oliver-sanders
Copy link
Member Author

oliver-sanders commented Aug 22, 2023

(FYI, I'll apply cosmetic suggestions and retest once functional review is complete)

@hjoliver
Copy link
Member

(I'll look at this ASAP...)

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice code (as usual 😩 ). Not tested yet, I'll approve tomorrow if I fail to break it!

changes.d/5651.break.md Outdated Show resolved Hide resolved
cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/util.py Outdated Show resolved Hide resolved
tests/unit/test_graph_parser.py Show resolved Hide resolved
tests/integration/test_optional_outputs.py Outdated Show resolved Hide resolved
tests/integration/test_optional_outputs.py Outdated Show resolved Hide resolved
tests/integration/test_optional_outputs.py Outdated Show resolved Hide resolved
Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com>
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
@oliver-sanders
Copy link
Member Author

(fixed flake8 error from suggestion)

@hjoliver
Copy link
Member

hjoliver commented Aug 31, 2023

Testing all good.

One question occurred to me though. It should have done earlier when reviewing the proposal, sorry. Still, better late than never. I'll illustrate with the xy(z) case:

foo:x? => x
foo:y? => y

If that's the entire graph off of foo, it makes pretty good sense that at least one of x or y must be generated by default, even though we've said they're both optional. We can document that that's for safety reasons, to avoid unplanned early shutdown of the workflow.

But what about this:

foo:x? => x
foo:y => y

or this:

foo:x? => x
foo => bar

In these cases early shutdown is not on the cards, and we have said that x is optional, but if foo does not generate x we get:

ERROR - Incomplete tasks:
      * 1/foo did not complete required outputs: ['x']

IMO this is overkill and confusing when we do not need to protect against early shutdown.

So perhaps we should change tack slightly: if a task has only optional outputs in the graph, then by default at least one of them must be generated.

[This applies whenever there is at least one required output used in the graph, and any number of optional outputs, but it's most obvious when there's a single optional output: currently on this branch, by default a single optional output is not optional!]

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Sep 1, 2023

But what about this:

foo:x? => x
foo:y => y

Do this:

completion = succeeded and y

or this:

foo:x? => x
foo => bar

Do this:

completion = succeeded

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Sep 1, 2023

if a task has only optional outputs in the graph, then by default at least one of them must be generated.

That wouldn't work as :succeeded is almost always explicitly specified in order to trigger the task e.g:

a => b     # b:succeeded
b:x? => x  # b:x?
b:y? => y  # b:y?

Making chains is a more common use case than breaking chains, we must prioritise the more likely use case at the expense of one line of configuration for the less likely. Note we've yet to encounter a use case for early shutdown, we should support it, but it shouldn't be the default, I don't think we should make the rules more complex and add extra implicit logic for this purpose.

@hjoliver
Copy link
Member

hjoliver commented Sep 1, 2023

Note we've yet to encounter a use case for early shutdown, we should support it, but it shouldn't be the default,

I agree (since the proposal discussions) - note I was not suggesting this above. I was suggesting we should take the optional output declaration seriously (by default) IF there are required outputs present to prevent early shutdown.

Do this:
completion = succeeded

Yes, as I said above I was just talking about the default situation - which is what most users will encounter most of the time.

That wouldn't work as :succeeded is almost always explicitly specified in order to trigger the task e.g:
a => b # b:succeeded
b:x? => x # b:x?
b:y? => y # b:y?

We know if the output has any graph children, and hence if a "chain" extends from it in the graph. So it would work, so long as "used in the graph" means it has children.

Unfortunately under the agreed model the (default) single optional output case looks downright illogical:

foo:x? => x 
foo => bar

By default this is exactly equivalent to:

foo:x => x
foo => bar

i.e. the optional output is by default not optional at all. Does that not bother you?

We could avoid this with a small tweak to the rule:

  • if there are MULTIPLE optional outputs then (by default) at least one must be generated

Which could be put like this (so it doesn't look like special-casing):

  • if there are ANY optional outputs, we apply a reasonable safety constraint (by default) so long as it doesn't entirely disable the declared optionality

Note this is only the default that I'm worried about; but before we lock it in I think we should really try to avoid default behaviour that will seem illogical to users.

[UPDATE]: my next comment below better articulates the problem and solution, once it occurred to me we really can't consider optional outputs alone for this purpose.

@hjoliver
Copy link
Member

hjoliver commented Sep 4, 2023

A bit more.

I'm still not convinced on the "chain breaking" argument, if early shut down is prevented by other (required) outputs.

Making chains is a more common use case than breaking chains, we must prioritise the more likely use case at the expense of one line of configuration for the less likely.

foo:x? => chain-x
foo:y? => chain-y
foo:z? => chain-z
...
foo => chain-bar  # required

Here there can be no early shutdown, thanks to the required output. But our rule arbitrarily allows all but one of the optional chains to be broken. If there are 26 optional chains and 1 required, we allow (by default) 25 optional chains to be broken. How is that to be justified exactly? It's not to prevent early shutdown, and it's not much of a chain breaking safety net! There's no particular reason to choose the number one as special, other than to prevent early shutdown - which is also prevented by required outputs.

So ... I think there are two logically justifiable models:

  1. require at least one chain (including non-optional ones) to be unbroken, to prevent inadvertent early shutdown
    • i.e. require at least one output (with children; including non-optional outputs) to be generated
      OR
  2. require all chains to be unbroken, to prevent any inadvertent chain breaking (and as a side-effect, inadvertent shutdown)
    • i.e. require ALL optional outputs to be generated, unless a completion condition explicitly states their optionality
    • here the ? syntax becomes an indicator that the output can be optional

The second option would be a proper default safety net, with no logical holes in it.

@oliver-sanders
Copy link
Member Author

require all chains to be unbroken, to prevent any inadvertent chain breaking (and as a side-effect, inadvertent shutdown)

i.e. require ALL optional outputs to be generated, unless a completion condition explicitly states their optionality
here the ? syntax becomes an indicator that the output can be optional

This approach effectively shifts optional output logic into the "runtime" section completely, the question marks in the graph would become superfluous, only validation would keep them in sync with the completion expression so they could be removed completely. Not necessarily a bad approach, however, this would be a substantial change which would break all graph branching in existing Cylc 8 workflows.

How is that to be justified exactly

It's the safest default which provides the safest behaviour in the most common scenarios. More complex requirements require more complex configuration, but the simple remains simple until you break it.

@hjoliver
Copy link
Member

hjoliver commented Sep 4, 2023

This approach effectively shifts optional output logic into the "runtime" section completely, the question marks in the graph would become superfluous, only validation would keep them in sync with the completion expression so they could be removed completely.

Maybe not. We could keep them in the graph to signify that there should be a corresponding completion condition. Outputs with no ? are definitively required.

Not necessarily a bad approach, however, this would be a substantial change which would break all graph branching in existing Cylc 8 workflows.

A safe breaking change though: the workflow would carry on as normal from the generated output; the task would just be marked incomplete. (And, it is already a breaking change without this).

How is that to be justified exactly

It's the safest default which provides the safest behaviour in the most common scenarios. More complex requirements require more complex configuration, but the simple remains simple until you break it.

My two suggestions are equally simple, but without the downsides:

  • by default, at least one used-in-the-graph output (optional or required) must be generated - to prevent inadvertent early shutdown
  • or, all used-in-the-graph outputs must be generated - to prevent any inadvertent chain breaking (and early shutdown)

Otherwise, use a completion condition.

What's not simple about those?

@MetRonnie
Copy link
Member

Would it be least confusing to users if we don't mandate at least 1 optional output be generated, instead leaving it up to them to write a completion condition if needed?

An exception could be made for if succeeded is optional, in which case we might want to mandate at least 1 optional output be generated, to ensure one path is taken?

@hjoliver
Copy link
Member

hjoliver commented Sep 5, 2023

Would it be least confusing to users if we don't mandate at least 1 optional output be generated, instead ...
An exception could be made for if succeeded is optional ... to ensure one path is taken?

Yes, that's exactly what I'm suggesting as my preferred approach.

The other option (no-chains-broken-by-default) is more heavy handed, and not my preference, but needs to be considered.

@oliver-sanders
Copy link
Member Author

Would it be least confusing to users if we don't mandate at least 1 optional output be generated, instead leaving it up to them to write a completion condition if needed?

Yes, that's exactly what I'm suggesting as my preferred approach.

Note, Hillary's suggestion was "one or more utilised outputs must be generated", which isn't quite the same and does have it's own drawbacks e.g. for this graph a:x? => x, the optional output a:x would still be required.

IMO, the proposed approach is nicer (the scope is restricted to optional outputs, the behaviour more closely caters to the principle "switch" use cases for graph branching).

The runtime only approach described above as "no-chains-broken-by-default" (i.e. if you want something to be optional you must explicitly configure it in the completion expression) has merit. We did consider going this way back at the beginning. The separation of scheduling and runtime is kinda a core principle of Cylc so it feels a bit wrong in some ways but makes things very explicit which would be beneficial. [Un]fortunately [depending on position] this would break all existing Cylc 8 graph branching which wouldn't make it the most popular change.

@oliver-sanders
Copy link
Member Author

Either way, this PR implements the accepted proposal, so I think review should continue whilst discussion goes on (in a proposal?) so we're in the place to release this when we've decided.

@hjoliver
Copy link
Member

hjoliver commented Sep 11, 2023

Note, Hillary's suggestion was "one or more utilised outputs must be generated", which isn't quite the same

Sorry, my reading of @MetRonnie's question was a bit off. Yes, my suggestion was "we don't mandate at least 1 optional output be generated" but also, as Oliver says, "one or more utilized outputs must be generated" - in order to prevent early shutdown due to inadvertent non-production of outputs.

and does have it's own drawbacks e.g. for this graph a:x? => x [if there are no other outputs used from a] the optional output a:x would still be required.

That's not really a drawback because the rule is "at least one output (of any kind) must be generated [by default] to prevent early shutdown" and in this case there are no other outputs to prevent early shutdown.

[UPDATE: I've now thought of a solution below that avoids the need to consider both optional and required outputs to get around this problem].

On this branch, the interpretation of this graph a:x? => x; a => b is a drawback because it makes :x? not optional (by default) despite the fact that a single optional output must be truly optional (otherwise there is no point in marking it as optional), and there can be no unexpected "chain-breaking" or early shutdown (because of a => b). IMO that just looks bad.

@hjoliver
Copy link
Member

hjoliver commented Sep 12, 2023

The runtime only approach described above as "no-chains-broken-by-default" (i.e. if you want something to be optional you must explicitly configure it in the completion expression) has merit.

I think we had better make a decision on this one way or the other, very soon. It would be too disruptive to switch to that approach after wide adoption of Cylc 8.

Pros:

  • no ambiguity
  • very safe: any/all chain-breaking has to be explicit
  • some users have found the ? notation difficult to understand?

Cons:

  • arguably the ? notation would not be needed
    • this will be jarring to those who have already adopted it
    • IMO the notation is good, and helpful in most cases
  • heavy handed
    • any optionality requires an explicit completion expression
    • over-prioritizing safety over power, by default, is not necessarily desirable (rm -i is not the default, etc.)

I only raised this for completeness, as one of the fully self-consistent approaches. IMO it is probably unnecessarily heavy handed, so my vote is NO. Possibly not a hard NO, but we need a quick decision.

@hjoliver
Copy link
Member

hjoliver commented Sep 12, 2023

Assuming we do dump the heavy-handed solution from consideration, here's a tweak that I think could resolve the impasse.

TLDR: Keep this branch as-is but add an "unrestricted optional output" notation for use when "one way or another" branching is not the intention.

DETAILS:

foo:x? => x
foo:y? => y

We say that the above notation means "restricted optional outputs", with the restriction being:

  • foo must generate at least one of its optional outputs (unless you supply a completion condition)
  • they might be further restricted (in a sense) by a non-default completion condition

The justification for this restriction is for safety reasons: optional outputs are OFTEN used to direct the flow one way or another, in which case no outputs at all means something must be wrong.

So far so good - that is this branch 🎉

The problem is it forces you to write a completion condition for truly optional outputs, which should not be necessary, and in particular the default looks downright stupid for the single-optional output case where going "one way or another" is not even possible.

So let's add a variant notation for unrestricted optional outputs:

foo:x?? => x

This notation means: the scheduler does not care at all if :x is generated or not. If it is, follow the graph; otherwise no biggie, that's an expected outcome too.

Notes:

  • x is not allowed to appear in completion conditions (there's no point if it's truly optional, so that would just be confusing)
  • document (and even warn in validation) that a sub-graph can terminate (with potential for early shutdown) if a task with only unrestricted outputs generates no outputs at all, so use with care
  • this allows deliberate branch termination without a completion condition, but now it is explicit in the graph (which is even more explicit than hidden under runtime in a completion condition) 🎉

@hjoliver hjoliver mentioned this pull request Sep 26, 2023
16 tasks
@oliver-sanders oliver-sanders added the BLOCKED This can't happen until something else does label Oct 5, 2023
@oliver-sanders
Copy link
Member Author

Blocked whilst the proposal is under re-review.

@oliver-sanders oliver-sanders marked this pull request as draft October 9, 2023 14:10
@oliver-sanders
Copy link
Member Author

The accepted proposal has now been rejected.

The new proposal requires branched logic so can't be flattened out using the completion statement the way this PR implemented it. Will pull out the relevant bits of code from this PR into a new branch and re-raise. I'll wait for the cylc set work to stabilise in order to reduce conflict potential.

@oliver-sanders oliver-sanders removed this from the cylc-8.3.0 milestone Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BLOCKED This can't happen until something else does
Projects
None yet
Development

Successfully merging this pull request may close these issues.

optional outputs: implement new proposal
3 participants