Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cylc set-outputs: use cases and trigger compatibility #4727

Closed
hjoliver opened this issue Mar 3, 2022 · 16 comments
Closed

cylc set-outputs: use cases and trigger compatibility #4727

hjoliver opened this issue Mar 3, 2022 · 16 comments
Assignees

Comments

@hjoliver
Copy link
Member

hjoliver commented Mar 3, 2022

Consider cylc set-outputs in light of final agreed behaviour of cylc trigger as per #4686 with respect to flows in particular.

The command causes spawning downstream of specified task outputs. The spawned children are:

  • flow-independent (i.e. no flow) by default
  • or with --flow=, will flow on as the specified flow

The original use cases in mind during spawn-on-demand implementation were focused on the spawned children NOT on the targeted task (the owner of the outputs) itself:

  • trigger a flow to re-run part of the graph, without running the task at the top of it
    • e.g. re-run multiple model post-processing tasks without running the model itself
  • manually satisfy off-flow prerequisites
    • when triggering a flow that is not entirely self-contained

Things to consider:

  • is the command name appropriate?
    • (early in the spawn-on-demand era it was called cylc spawn)
  • what exactly are the use cases?
    • (and document them)
  • consistency with cylc trigger, and implications for the targeted task (the owner of the outputs):
    • e.g. if any output gets "set" we should consider the task was already spawned in that flow (just as if it was triggered)
@hjoliver
Copy link
Member Author

hjoliver commented Mar 3, 2022

Use Cases

Original (above):

  • trigger a flow without running the initial tasks in it
  • manually complete off-flow prerequisites

Other:

  • reset an incomplete task to complete it? (but note this has spawning implications, unlike cylc remove)
  • manually set the state of switch tasks (at an optional branch point) to direct the flow one way
  • set jobs to failed when a job platform is known to be down (needs clarification!)
  • other?

These may need the target task to be recorded in the DB as "spawned already in this flow" for future flow merge purposes. (And the "set" outputs should be recorded as complete too).

@oliver-sanders
Copy link
Member

Here are my thoughts:

  • set-outputs could do with a snappier name, perhaps just set?
  • The --flow argument from the trigger proposal makes perfect sense in the contact of set-outputs.
  • The --wait argument does not make sense / is not needed (since --wait is implied by --flow=none).
  • The default should be --flow=all, the same as for cylc trigger.
    • This makes set-outputs pretty similar to Cylc 7 reset.
    • Useful for manually intervening with a workflow e.g. fixing issues by hand or controlling switch tasks.
  • Outputs should be written to the DB unless --flow=none.
  • Tasks with completed outputs should not be re-run (potentially unless --flow=none depending on graph evolution).
    • The user has told Cylc that the outputs are complete, set-outputs should be regarded at the same level as task messages.
    • I.E. if you set the outputs on an n>0, when the task is spawned it should be spawned with its outputs completed.
    • --flow=none could potentially be useful in cases where you want to allow the workflow to ignore a single output at the time but don't want to prevent the task running or interfere with it otherwise.
    • For provenance reasons there should ideally be evidence in the DB of a --flow=none somewhere, just not in the outputs table.

@dpmatthews
Copy link
Contributor

I've discussed this with @oliver-sanders and we'd like to propose not supporting --flow=none (can't really think of a good use-case). I've also thought of some other considerations so here's an updated proposal:

  • By default, apply to all active flows (same as cylc trigger).
    Use --flow to specify a specific flow or a new flow (but do not support --flow=none).
  • By default set all the required outputs for a task.
    • Note that this means set-outputs will do nothing (by default) when applied to a task with no required outputs.
      The idea is that, by default, you set the outputs you would expect from a successful run of that task.
      If succeeded and failed are optional you need to specify which one you want - there is no default.
    • Use --output to specify the output (or outputs) to set, e.g. --output=failed.
      The only valid options for --output are "succeeded", "failed", "submit-failed" or custom outputs.
  • When setting an output, also set/unset implied outputs (e.g. succeeded implies submitted:set, submit-failed:unset, started:set, finished:set, failed:unset).
    So if you have a task which previously ran in this flow and failed, setting the succeeded output will also unset the failed output.
  • By default, spawn any downstream tasks based on the outputs
    • Do this for all the current outputs for the task including any already recorded in the database for the relevant flow(s).
      This provides a way to "resume" a flow after a cylc trigger --wait.
    • Use --no-spawn (?) to prevent spawning of tasks.
      This provides a way to set "off-flow prerequisites" without starting a flow.
      Is there a better way to handle this case?
  • Outputs are written to the DB.
    i.e. for flow merge purposes the specified tasks will be recorded as having already run with the specified and implied outputs so will not re-run on merge.
    Note: it is possible the task will have incomplete outputs in which case it will not be re-run but will stay in the n=0 pool (i.e. treat it as though it ran but didn't complete it's required outputs).
  • If the "succeeded" or "failed" or "submit-failed" output is set for a task that is currently submitted or running then the task will be considered finished and any subsequent messages from that task will be ignored.
    • This provides a way to deal with tasks stuck in the submitted or running state on a platform that can not be contacted.

@hjoliver
Copy link
Member Author

set-outputs could do with a snappier name, perhaps just set?

Yeah set-outputs is not a great name, but set is too non-specific? See #4728 (comment)

@hjoliver hjoliver added the question Flag this as a question for the next Cylc project meeting. label Mar 31, 2022
@hjoliver hjoliver modified the milestones: cylc-8.0rc3, cylc-8.0rc4 Apr 11, 2022
@oliver-sanders oliver-sanders modified the milestones: cylc-8.0.1, cylc-8.1.0 Jul 27, 2022
@hjoliver
Copy link
Member Author

I've just come back to this after a user reported that cylc set-outputs didn't behave as expected. The reason was, not surprisingly, the outputs that get "set" currently get flow=none by default - as noted above (which at the time of implementation I considered to be the safest default pending resolution of the above questions ... safe but not desirable!).

The suggestions above all fit with my conception of how it should work, at this stage.

I should probably crack on with implementation, as the current state of the command is not good.

Is there still a desire to change the command name? "Set-outputs" is not pretty, but it is specific and descriptive.

@oliver-sanders oliver-sanders modified the milestones: cylc-8.1.0, cylc-8.2.0 Oct 18, 2022
@hjoliver hjoliver self-assigned this Oct 26, 2022
@hjoliver hjoliver removed the question Flag this as a question for the next Cylc project meeting. label Oct 26, 2022
@wxtim
Copy link
Member

wxtim commented Oct 27, 2022

@hjoliver - I've been investigating a problem one of our early adopters has been having with some flaky tasks, that is:

reset an incomplete task to complete it? (but note this has spawning implications, unlike cylc remove)
manually set the state of switch tasks (at an optional branch point) to direct the flow one way

Using

graph = foo[-P1Y] => foo

Where foo is a task that fails in about 1% of cases. User would like to manually tell the workflow to carry on regardless but I cannot find a combination or remove/set outputs where the workflow doesn't stall with

      * 10150101T0000Z/foo did not complete required outputs: ['succeeded']

And thus require all downstream tasks to be manually triggered.

@hjoliver
Copy link
Member Author

hjoliver commented Oct 27, 2022

@wxtim - not sure I understand the problem. Can we not just do this?:

  1. manually trigger the next task(s) (i.e., carry on despite events)
    • (or use set_outputs to manually spawn them)
  2. manually remove the failed/incomplete task (i.e., don't let it stall the workflow)

@hjoliver
Copy link
Member Author

hjoliver commented Oct 27, 2022

manually set the state of switch tasks (at an optional branch point) to direct the flow one way

That seems like a different problem?

  1. use set-outputs to spawn the intended branch, then
  2. manually kill and remove the first task on the other branch, if the flow ends up going that way automatically

If they want to manually intervene in advance to make the flow go one way and not the other at some future point, I'm not sure that's a reasonable request. Manual intervention is mainly intended for responding to unexpected events that have occurred already. This suggest to me that the graph is wrong, given the way the tasks are behaving, and so the right solution is to change the graph (via reload or restart) or change the task definitions (via reload or restart, or broadcast <--- which is an advance manual intervention, I guess).

@hjoliver
Copy link
Member Author

hjoliver commented Oct 27, 2022

@dpmatthews - I've started working on this, so thinking about some of the details in more depth.

Use --flow to specify a specific flow or a new flow (but do not support --flow=none).

Not sure about this. Consider a => x & y & z => .... During workflow development, say, we might want to test run x, y, z all at once, without continuing the flow. A single use of cylc set-outputs --flow=none a would have the same effect as multiple cylc trigger --flow=none for each of x, y, z ... why not allow that? The no-flow op would not leave anything in the DB to affect subsequent action.

@hjoliver
Copy link
Member Author

hjoliver commented Oct 27, 2022

Use --no-spawn (?) to prevent spawning of tasks.
This provides a way to set "off-flow prerequisites" without starting a flow.
Is there a better way to handle this case?

That's not necessary for our canonical example of this:

tmpbjypqn4v

Above, to set up the orange flow, I can do cylc set-outputs test//1/b, which will spawn 1/d into the hidden pool (for partially satisfied prerequisites). The flow won't continue at 1/d until 1/c succeeds to complete its prerequisites.

If you mean "without starting another flow off of the same output" (or off another of the "implied" outputs, if we do that):

tmpm47mg9vx

Above, setting 1/b:succeeded will cause 1/g to run immediately, as well as setting up the intended flow from 1/c.

But delayed spawning as currently conceived won't prevent that. Our current flow-wait mechanism stores completed outputs of a task in the DB, and spawns them when the flow catches up to that task. Here, 1/b is "that task" and no flow is gonna catch up to it.

To set up the intended flow from 1/c without triggering anything else, we need cylc set-prerequisites ... to spawn 1/d into the hidden pool with its prerequisite on 1/b satisfied.

(I think this makes good sense. To spawn multiple tasks off of a single output, use set-outputs; to spawn only one task that depends on an output with multiple dependents, use set-prerequisites; in a one-to-one situation, either will do.)

@wxtim
Copy link
Member

wxtim commented Oct 28, 2022

This procedure is fine - what I think I'm suggesting is that "set outputs" doesn't work as an alternative to this, which it sounds like it ought to be - Were I a user my expectation for set outputs to succeeded would be equivalent to this. It's not. I thing "trigger output" describes what seems to be happening better.

@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 28, 2022

From scraping the above, these are the use cases for consideration:

  1. Trigger a flow to re-run part of the graph:
    • A more convenient alternative to cylc trigger for cases where the number of downstream tasks [to trigger] is greater than the number of upstream outputs [to set].
  2. Manually satisfy off-flow prerequisites:
    • A more convenient alternative to cylc trigger for triggering a flow that is not entirely self-contained where the number of downstream tasks [to trigger] is greater than the number of upstream outputs [to set].
  3. Manually set task state:
    • i.e. The cylc reset use cases.
    • e.g. My task failed so I manually corrected the failure, now I want to tell Cylc the task has succeeded so we can carry on.
    • e.g. I've got a whole bunch of failed tasks because something external went wrong, I can't recover this. Now I want Cylc to skip onto the next cycle (e.g. pretend these tasks succeeded, or reset to expired).
    • e.g. Some "expired" state use cases.
  4. Manually set the state of switch tasks (at an optional branch point) to direct the flow one way.
  5. Set jobs to failed when a job platform is known to be down (needs clarification!).

Whether these use cases are to be solved by set-ouputs (or what it becomes) or by other command(s), they're all related so I think we should come up with a plan here.

Cases (1) & (2) which weren't well catered to by Cylc 7 are covered by the current set-outputs implementation.

The Cylc 7 case (3) and the new case (4) are not covered very nicely because:

  • There's no indication of the "set output" in the GUI/Tui.
  • The task we "set" the "output" on is still considered incomplete by Cylc so needs to be manually removed (as flagged above by Tim), this extra step is inconvenient and not obvious - Users should not need to know about insertion (or task proxies generally) #2143
  • Even when we remove the task it can be re-spawned by graph logic at a later date requiring re-removal.
  • I think "setting" the "output" of a task doesn't disable automatic retries which could mess with the workflow if the task isn't removed, or isn't removed quickly enough, or gets re-spawned at a later date.
  • Removing tasks is nasty it only applies to the task pool which isn't what user's would expect, note users are looking at the n=1 window which deepens the confusion. So removal requires a reasonable understanding of the task pool, removing a sub-graph of tasks can get very messy. We've recently seen a case where a user tried to skip a cycle by removing tasks and had got their workflow into a state we were unable to recover it from (because we couldn't work out what on Earth was going on).

@hjoliver
Copy link
Member Author

I know how to proceed here, will try to write it up succinctly for review soon. Just a couple of comments for now:

The scheduler "task pool" is, to all intents and purposes, gone as a user-facing concept.

What we have now is n=0 tasks, which include a) active tasks; and b) those that are ready but held back by something other than task dependencies (this includes incomplete tasks).

Unfortunately, however:

  • the GUI does not yet distinguish between waiting/held-back n=0 tasks and abstract future waiting tasks in n>=1. But that's a GUI problem, not a problem with the scheduling model
  • several command names and associated functionality and options still need tweaking, to be intuitive. In particular, cylc remove is a terrible name that entirely reflects its old purpose.

@hjoliver
Copy link
Member Author

hjoliver commented Dec 1, 2022

@oliver-sanders:

Even when we remove the task it can be re-spawned by graph logic at a later date requiring re-removal.

That's not true BTW. The db records if a task was spawned already in the current flow.

@hjoliver
Copy link
Member Author

@oliver-sanders oliver-sanders modified the milestones: cylc-8.2.0, cylc-8.3.0 Jun 29, 2023
@oliver-sanders
Copy link
Member

Superseded by #5642

@oliver-sanders oliver-sanders removed this from the cylc-8.3.0 milestone Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants