optional outputs extension #5651

oliver-sanders · 2023-07-27T09:59:30Z

Implements: https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html
Closes: #5640

Require at least one optional output to be generated in order for the task to be considered "completed" by default.
Add the ability for users to manually specify the completion expression.
Bring task expiry into the optional outputs model. Document in changelog.
Bolster optional output testing.
Generalise the interface for defining restricted Python evaluators.

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
Tests are included (or explain why tests are not needed).
CHANGES.md entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at Optional output extension cylc-doc#634.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

tests/unit/test_graph_parser.py

**Sibling:** cylc/cylc-flow#5651 Document the changes outlined in https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html

oliver-sanders · 2023-07-31T16:33:30Z

cylc/flow/host_select.py

Tangentially related.

We were already using restricted Python evaluation to run the host selection ranking expressions, so I generalised this interface to allow for new uses going forward. This logic in this file should be unchanged.

oliver-sanders · 2023-07-31T16:38:41Z

cylc/flow/task_outputs.py

The actual logic of the change lies here, most of the other changes are validation.

oliver-sanders · 2023-07-31T16:39:13Z

cylc/flow/task_pool.py

+        if isinstance(point, str):
+            # convenient for tests, use PointBase instances for efficiency
+            # otherwise
+            point = get_point(point)


Makes it easier to pull out tasks in integration tests.

oliver-sanders · 2023-07-31T16:41:24Z

cylc/flow/task_pool.py

-            if incomplete:
+            if itask.state.outputs.is_incomplete():
                # Retain as incomplete.
                LOG.warning(
                    f"[{itask}] did not complete required outputs:"
-                    f" {incomplete}"
+                    f" {itask.state.outputs.get_incomplete()}"


After these changes bool(get_incomplete()) != is_complete().

E.G. if completion = succeeded and (x or y) and succeeded and x have been generated, get_incomplete() will return y.

oliver-sanders · 2023-07-31T16:46:18Z

tests/unit/test_task_outputs.py


-class TestMessageSorting(unittest.TestCase):


[unrelated] converted from unit test.

hjoliver · 2023-08-01T07:07:21Z

(That was quick!!)

**Implements:** https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html **Closes:** cylc#5640 * Require at least one optional output to be generated in order for the task to be considered "completed" by default. * Add the ability for users to manually specify the completion expression. * Bring task expiry into the optional outputs model. Document in changelog.

oliver-sanders · 2023-08-01T14:04:00Z

The functional bit of the code is shockingly simple, validation proved trickier.

MetRonnie

Partial review

cylc/flow/cfgspec/workflow.py

cylc/flow/config.py

cylc/flow/task_outputs.py

tests/integration/test_config.py

tests/integration/test_optional_outputs.py

oliver-sanders · 2023-08-22T14:43:10Z

(FYI, I'll apply cosmetic suggestions and retest once functional review is complete)

hjoliver · 2023-08-22T23:31:39Z

(I'll look at this ASAP...)

hjoliver

Very nice code (as usual 😩 ). Not tested yet, I'll approve tomorrow if I fail to break it!

changes.d/5651.break.md

cylc/flow/cfgspec/workflow.py

cylc/flow/util.py

tests/unit/test_graph_parser.py

tests/integration/test_optional_outputs.py

Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com> Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>

oliver-sanders · 2023-08-29T12:13:58Z

(fixed flake8 error from suggestion)

hjoliver · 2023-08-31T22:02:30Z

Testing all good.

One question occurred to me though. It should have done earlier when reviewing the proposal, sorry. Still, better late than never. I'll illustrate with the xy(z) case:

foo:x? => x
foo:y? => y

If that's the entire graph off of foo, it makes pretty good sense that at least one of x or y must be generated by default, even though we've said they're both optional. We can document that that's for safety reasons, to avoid unplanned early shutdown of the workflow.

But what about this:

foo:x? => x
foo:y => y

or this:

foo:x? => x
foo => bar

In these cases early shutdown is not on the cards, and we have said that x is optional, but if foo does not generate x we get:

ERROR - Incomplete tasks:
      * 1/foo did not complete required outputs: ['x']

IMO this is overkill and confusing when we do not need to protect against early shutdown.

So perhaps we should change tack slightly: if a task has only optional outputs in the graph, then by default at least one of them must be generated.

[This applies whenever there is at least one required output used in the graph, and any number of optional outputs, but it's most obvious when there's a single optional output: currently on this branch, by default a single optional output is not optional!]

oliver-sanders · 2023-09-01T08:26:47Z

But what about this:

foo:x? => x
foo:y => y

Do this:

completion = succeeded and y

or this:

foo:x? => x
foo => bar

Do this:

completion = succeeded

oliver-sanders · 2023-09-01T08:34:43Z

if a task has only optional outputs in the graph, then by default at least one of them must be generated.

That wouldn't work as :succeeded is almost always explicitly specified in order to trigger the task e.g:

a => b     # b:succeeded
b:x? => x  # b:x?
b:y? => y  # b:y?

Making chains is a more common use case than breaking chains, we must prioritise the more likely use case at the expense of one line of configuration for the less likely. Note we've yet to encounter a use case for early shutdown, we should support it, but it shouldn't be the default, I don't think we should make the rules more complex and add extra implicit logic for this purpose.

hjoliver · 2023-09-01T21:46:44Z

Note we've yet to encounter a use case for early shutdown, we should support it, but it shouldn't be the default,

I agree (since the proposal discussions) - note I was not suggesting this above. I was suggesting we should take the optional output declaration seriously (by default) IF there are required outputs present to prevent early shutdown.

Do this:
completion = succeeded

Yes, as I said above I was just talking about the default situation - which is what most users will encounter most of the time.

That wouldn't work as :succeeded is almost always explicitly specified in order to trigger the task e.g:
a => b # b:succeeded
b:x? => x # b:x?
b:y? => y # b:y?

We know if the output has any graph children, and hence if a "chain" extends from it in the graph. So it would work, so long as "used in the graph" means it has children.

Unfortunately under the agreed model the (default) single optional output case looks downright illogical:

foo:x? => x 
foo => bar

By default this is exactly equivalent to:

foo:x => x
foo => bar

i.e. the optional output is by default not optional at all. Does that not bother you?

We could avoid this with a small tweak to the rule:

if there are MULTIPLE optional outputs then (by default) at least one must be generated

Which could be put like this (so it doesn't look like special-casing):

if there are ANY optional outputs, we apply a reasonable safety constraint (by default) so long as it doesn't entirely disable the declared optionality

Note this is only the default that I'm worried about; but before we lock it in I think we should really try to avoid default behaviour that will seem illogical to users.

[UPDATE]: my next comment below better articulates the problem and solution, once it occurred to me we really can't consider optional outputs alone for this purpose.

hjoliver · 2023-09-04T02:03:04Z

A bit more.

I'm still not convinced on the "chain breaking" argument, if early shut down is prevented by other (required) outputs.

Making chains is a more common use case than breaking chains, we must prioritise the more likely use case at the expense of one line of configuration for the less likely.

foo:x? => chain-x
foo:y? => chain-y
foo:z? => chain-z
...
foo => chain-bar  # required

Here there can be no early shutdown, thanks to the required output. But our rule arbitrarily allows all but one of the optional chains to be broken. If there are 26 optional chains and 1 required, we allow (by default) 25 optional chains to be broken. How is that to be justified exactly? It's not to prevent early shutdown, and it's not much of a chain breaking safety net! There's no particular reason to choose the number one as special, other than to prevent early shutdown - which is also prevented by required outputs.

So ... I think there are two logically justifiable models:

require at least one chain (including non-optional ones) to be unbroken, to prevent inadvertent early shutdown
- i.e. require at least one output (with children; including non-optional outputs) to be generated
  OR
require all chains to be unbroken, to prevent any inadvertent chain breaking (and as a side-effect, inadvertent shutdown)
- i.e. require ALL optional outputs to be generated, unless a completion condition explicitly states their optionality
- here the ? syntax becomes an indicator that the output can be optional

The second option would be a proper default safety net, with no logical holes in it.

oliver-sanders · 2023-09-04T08:59:37Z

require all chains to be unbroken, to prevent any inadvertent chain breaking (and as a side-effect, inadvertent shutdown)

i.e. require ALL optional outputs to be generated, unless a completion condition explicitly states their optionality
here the ? syntax becomes an indicator that the output can be optional

This approach effectively shifts optional output logic into the "runtime" section completely, the question marks in the graph would become superfluous, only validation would keep them in sync with the completion expression so they could be removed completely. Not necessarily a bad approach, however, this would be a substantial change which would break all graph branching in existing Cylc 8 workflows.

How is that to be justified exactly

It's the safest default which provides the safest behaviour in the most common scenarios. More complex requirements require more complex configuration, but the simple remains simple until you break it.

hjoliver · 2023-09-04T21:07:11Z

This approach effectively shifts optional output logic into the "runtime" section completely, the question marks in the graph would become superfluous, only validation would keep them in sync with the completion expression so they could be removed completely.

Maybe not. We could keep them in the graph to signify that there should be a corresponding completion condition. Outputs with no ? are definitively required.

Not necessarily a bad approach, however, this would be a substantial change which would break all graph branching in existing Cylc 8 workflows.

A safe breaking change though: the workflow would carry on as normal from the generated output; the task would just be marked incomplete. (And, it is already a breaking change without this).

How is that to be justified exactly

It's the safest default which provides the safest behaviour in the most common scenarios. More complex requirements require more complex configuration, but the simple remains simple until you break it.

My two suggestions are equally simple, but without the downsides:

by default, at least one used-in-the-graph output (optional or required) must be generated - to prevent inadvertent early shutdown
or, all used-in-the-graph outputs must be generated - to prevent any inadvertent chain breaking (and early shutdown)

Otherwise, use a completion condition.

What's not simple about those?

MetRonnie · 2023-09-05T12:59:58Z

Would it be least confusing to users if we don't mandate at least 1 optional output be generated, instead leaving it up to them to write a completion condition if needed?

An exception could be made for if succeeded is optional, in which case we might want to mandate at least 1 optional output be generated, to ensure one path is taken?

hjoliver · 2023-09-05T23:06:52Z

Would it be least confusing to users if we don't mandate at least 1 optional output be generated, instead ...
An exception could be made for if succeeded is optional ... to ensure one path is taken?

Yes, that's exactly what I'm suggesting as my preferred approach.

The other option (no-chains-broken-by-default) is more heavy handed, and not my preference, but needs to be considered.

oliver-sanders · 2023-09-11T15:25:45Z

Would it be least confusing to users if we don't mandate at least 1 optional output be generated, instead leaving it up to them to write a completion condition if needed?

Yes, that's exactly what I'm suggesting as my preferred approach.

Note, Hillary's suggestion was "one or more utilised outputs must be generated", which isn't quite the same and does have it's own drawbacks e.g. for this graph a:x? => x, the optional output a:x would still be required.

IMO, the proposed approach is nicer (the scope is restricted to optional outputs, the behaviour more closely caters to the principle "switch" use cases for graph branching).

The runtime only approach described above as "no-chains-broken-by-default" (i.e. if you want something to be optional you must explicitly configure it in the completion expression) has merit. We did consider going this way back at the beginning. The separation of scheduling and runtime is kinda a core principle of Cylc so it feels a bit wrong in some ways but makes things very explicit which would be beneficial. [Un]fortunately [depending on position] this would break all existing Cylc 8 graph branching which wouldn't make it the most popular change.

oliver-sanders · 2023-09-11T15:29:14Z

Either way, this PR implements the accepted proposal, so I think review should continue whilst discussion goes on (in a proposal?) so we're in the place to release this when we've decided.

hjoliver · 2023-09-11T23:06:22Z

Note, Hillary's suggestion was "one or more utilised outputs must be generated", which isn't quite the same

Sorry, my reading of @MetRonnie's question was a bit off. Yes, my suggestion was "we don't mandate at least 1 optional output be generated" but also, as Oliver says, "one or more utilized outputs must be generated" - in order to prevent early shutdown due to inadvertent non-production of outputs.

and does have it's own drawbacks e.g. for this graph a:x? => x [if there are no other outputs used from a] the optional output a:x would still be required.

That's not really a drawback because the rule is "at least one output (of any kind) must be generated [by default] to prevent early shutdown" and in this case there are no other outputs to prevent early shutdown.

[UPDATE: I've now thought of a solution below that avoids the need to consider both optional and required outputs to get around this problem].

On this branch, the interpretation of this graph a:x? => x; a => b is a drawback because it makes :x? not optional (by default) despite the fact that a single optional output must be truly optional (otherwise there is no point in marking it as optional), and there can be no unexpected "chain-breaking" or early shutdown (because of a => b). IMO that just looks bad.

hjoliver · 2023-09-12T00:25:55Z

The runtime only approach described above as "no-chains-broken-by-default" (i.e. if you want something to be optional you must explicitly configure it in the completion expression) has merit.

I think we had better make a decision on this one way or the other, very soon. It would be too disruptive to switch to that approach after wide adoption of Cylc 8.

Pros:

no ambiguity
very safe: any/all chain-breaking has to be explicit
some users have found the ? notation difficult to understand?

Cons:

arguably the ? notation would not be needed
- this will be jarring to those who have already adopted it
- IMO the notation is good, and helpful in most cases
heavy handed
- any optionality requires an explicit completion expression
- over-prioritizing safety over power, by default, is not necessarily desirable (rm -i is not the default, etc.)

I only raised this for completeness, as one of the fully self-consistent approaches. IMO it is probably unnecessarily heavy handed, so my vote is NO. Possibly not a hard NO, but we need a quick decision.

hjoliver · 2023-09-12T03:20:51Z

Assuming we do dump the heavy-handed solution from consideration, here's a tweak that I think could resolve the impasse.

TLDR: Keep this branch as-is but add an "unrestricted optional output" notation for use when "one way or another" branching is not the intention.

DETAILS:

foo:x? => x
foo:y? => y

We say that the above notation means "restricted optional outputs", with the restriction being:

foo must generate at least one of its optional outputs (unless you supply a completion condition)
they might be further restricted (in a sense) by a non-default completion condition

The justification for this restriction is for safety reasons: optional outputs are OFTEN used to direct the flow one way or another, in which case no outputs at all means something must be wrong.

So far so good - that is this branch 🎉

The problem is it forces you to write a completion condition for truly optional outputs, which should not be necessary, and in particular the default looks downright stupid for the single-optional output case where going "one way or another" is not even possible.

So let's add a variant notation for unrestricted optional outputs:

foo:x?? => x

This notation means: the scheduler does not care at all if :x is generated or not. If it is, follow the graph; otherwise no biggie, that's an expected outcome too.

Notes:

x is not allowed to appear in completion conditions (there's no point if it's truly optional, so that would just be confusing)
document (and even warn in validation) that a sub-graph can terminate (with potential for early shutdown) if a task with only unrestricted outputs generates no outputs at all, so use with care
this allows deliberate branch termination without a completion condition, but now it is explicit in the graph (which is even more explicit than hidden under runtime in a completion condition) 🎉

oliver-sanders · 2023-10-05T12:42:16Z

Blocked whilst the proposal is under re-review.

oliver-sanders · 2023-10-18T13:46:05Z

The accepted proposal has now been rejected.

The new proposal requires branched logic so can't be flattened out using the completion statement the way this PR implemented it. Will pull out the relevant bits of code from this PR into a new branch and re-raise. I'll wait for the cylc set work to stabilise in order to reduce conflict potential.

oliver-sanders added the sod-follow-up label Jul 27, 2023

oliver-sanders added this to the cylc-8.3.0 milestone Jul 27, 2023

oliver-sanders self-assigned this Jul 27, 2023

oliver-sanders commented Jul 27, 2023

View reviewed changes

tests/unit/test_graph_parser.py Show resolved Hide resolved

oliver-sanders force-pushed the optional-outputs-proposal branch 2 times, most recently from 6caf5be to db3aa96 Compare July 28, 2023 15:12

oliver-sanders added a commit to oliver-sanders/cylc-doc that referenced this pull request Jul 31, 2023

optional output extension

07a7ac4

**Sibling:** cylc/cylc-flow#5651 Document the changes outlined in https://cylc.github.io/cylc-admin/proposal-optional-output-extension.html

oliver-sanders mentioned this pull request Jul 31, 2023

Optional output extension cylc/cylc-doc#634

Closed

1 task

oliver-sanders force-pushed the optional-outputs-proposal branch from db3aa96 to e240b3f Compare July 31, 2023 16:27

oliver-sanders commented Jul 31, 2023

View reviewed changes

cylc/flow/task_outputs.py

Copy link

Member Author

oliver-sanders Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual logic of the change lies here, most of the other changes are validation.

oliver-sanders commented Jul 31, 2023

View reviewed changes

tests/unit/test_task_outputs.py

class TestMessageSorting(unittest.TestCase):

Copy link

Member Author

oliver-sanders Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[unrelated] converted from unit test.

oliver-sanders force-pushed the optional-outputs-proposal branch from e240b3f to 228eb0d Compare July 31, 2023 17:00

oliver-sanders marked this pull request as ready for review July 31, 2023 17:00

oliver-sanders requested a review from hjoliver July 31, 2023 17:00

oliver-sanders added 2 commits August 1, 2023 15:02

host select: move to new restricted_evaluator interface

ba42cc0

oliver-sanders force-pushed the optional-outputs-proposal branch from 228eb0d to ba42cc0 Compare August 1, 2023 14:02

oliver-sanders requested a review from MetRonnie August 2, 2023 16:24

MetRonnie reviewed Aug 7, 2023

View reviewed changes

tests/integration/test_config.py Show resolved Hide resolved

tests/integration/test_optional_outputs.py Outdated Show resolved Hide resolved

tests/integration/test_optional_outputs.py Outdated Show resolved Hide resolved

tests/integration/test_optional_outputs.py Show resolved Hide resolved

oliver-sanders linked an issue Aug 10, 2023 that may be closed by this pull request

optional outputs: implement new proposal #5640

Closed

hjoliver reviewed Aug 24, 2023

View reviewed changes

oliver-sanders force-pushed the optional-outputs-proposal branch from 5f06cf7 to 088dcd9 Compare August 24, 2023 09:13

Apply suggestions from code review

6a710c0

Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com> Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>

oliver-sanders force-pushed the optional-outputs-proposal branch from 088dcd9 to 6a710c0 Compare August 29, 2023 12:13

hjoliver mentioned this pull request Sep 26, 2023

Implement "cylc set" command #5658

Merged

16 tasks

oliver-sanders added the BLOCKED This can't happen until something else does label Oct 5, 2023

oliver-sanders marked this pull request as draft October 9, 2023 14:10

oliver-sanders closed this Oct 18, 2023

oliver-sanders removed this from the cylc-8.3.0 milestone Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optional outputs extension #5651

optional outputs extension #5651

oliver-sanders commented Jul 27, 2023 •

edited

Loading

oliver-sanders Jul 31, 2023

oliver-sanders Jul 31, 2023

oliver-sanders Jul 31, 2023

oliver-sanders Jul 31, 2023

oliver-sanders Jul 31, 2023

hjoliver commented Aug 1, 2023

oliver-sanders commented Aug 1, 2023

MetRonnie left a comment

oliver-sanders commented Aug 22, 2023 •

edited

Loading

hjoliver commented Aug 22, 2023

hjoliver left a comment

oliver-sanders commented Aug 29, 2023

hjoliver commented Aug 31, 2023 •

edited

Loading

oliver-sanders commented Sep 1, 2023 •

edited

Loading

oliver-sanders commented Sep 1, 2023 •

edited

Loading

hjoliver commented Sep 1, 2023 •

edited

Loading

hjoliver commented Sep 4, 2023 •

edited

Loading

oliver-sanders commented Sep 4, 2023

hjoliver commented Sep 4, 2023

MetRonnie commented Sep 5, 2023

hjoliver commented Sep 5, 2023

oliver-sanders commented Sep 11, 2023

oliver-sanders commented Sep 11, 2023

hjoliver commented Sep 11, 2023 •

edited

Loading

hjoliver commented Sep 12, 2023 •

edited

Loading

hjoliver commented Sep 12, 2023 •

edited

Loading

oliver-sanders commented Oct 5, 2023

oliver-sanders commented Oct 18, 2023

optional outputs extension #5651

optional outputs extension #5651

Conversation

oliver-sanders commented Jul 27, 2023 • edited Loading

oliver-sanders Jul 31, 2023

Choose a reason for hiding this comment

oliver-sanders Jul 31, 2023

Choose a reason for hiding this comment

oliver-sanders Jul 31, 2023

Choose a reason for hiding this comment

oliver-sanders Jul 31, 2023

Choose a reason for hiding this comment

oliver-sanders Jul 31, 2023

Choose a reason for hiding this comment

hjoliver commented Aug 1, 2023

oliver-sanders commented Aug 1, 2023

MetRonnie left a comment

Choose a reason for hiding this comment

oliver-sanders commented Aug 22, 2023 • edited Loading

hjoliver commented Aug 22, 2023

hjoliver left a comment

Choose a reason for hiding this comment

oliver-sanders commented Aug 29, 2023

hjoliver commented Aug 31, 2023 • edited Loading

oliver-sanders commented Sep 1, 2023 • edited Loading

oliver-sanders commented Sep 1, 2023 • edited Loading

hjoliver commented Sep 1, 2023 • edited Loading

hjoliver commented Sep 4, 2023 • edited Loading

oliver-sanders commented Sep 4, 2023

hjoliver commented Sep 4, 2023

MetRonnie commented Sep 5, 2023

hjoliver commented Sep 5, 2023

oliver-sanders commented Sep 11, 2023

oliver-sanders commented Sep 11, 2023

hjoliver commented Sep 11, 2023 • edited Loading

hjoliver commented Sep 12, 2023 • edited Loading

hjoliver commented Sep 12, 2023 • edited Loading

oliver-sanders commented Oct 5, 2023

oliver-sanders commented Oct 18, 2023

oliver-sanders commented Jul 27, 2023 •

edited

Loading

oliver-sanders commented Aug 22, 2023 •

edited

Loading

hjoliver commented Aug 31, 2023 •

edited

Loading

oliver-sanders commented Sep 1, 2023 •

edited

Loading

oliver-sanders commented Sep 1, 2023 •

edited

Loading

hjoliver commented Sep 1, 2023 •

edited

Loading

hjoliver commented Sep 4, 2023 •

edited

Loading

hjoliver commented Sep 11, 2023 •

edited

Loading

hjoliver commented Sep 12, 2023 •

edited

Loading

hjoliver commented Sep 12, 2023 •

edited

Loading