Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow labels and (re)flow metadata #3744

Closed
hjoliver opened this issue Aug 5, 2020 · 6 comments · Fixed by #4300
Closed

Flow labels and (re)flow metadata #3744

hjoliver opened this issue Aug 5, 2020 · 6 comments · Fixed by #4300
Assignees
Milestone

Comments

@hjoliver
Copy link
Member

hjoliver commented Aug 5, 2020

Current flow label implementation supports up to 52 concurrent flows within a single workflow, plus partially merged flows in progress.

My thinking was: that's probably enough(?), and the simple character-based labels are very easy to log and to use. And we could re-implement if necessary, e.g. with sets of UUIDs, to support an arbitrary number of concurrent flows, plus user-supplied metadata for ease of use (to avoid having to type the UUIDs).

However, we probably want flow metadata even for the current flow labels (which while simple, are arbitrary) so that users can keep track of the purpose of each flow.

@hjoliver hjoliver self-assigned this Aug 5, 2020
@hjoliver hjoliver added this to the some-day milestone Aug 5, 2020
@TomekTrzeciak
Copy link
Contributor

How do task retry and reflow relate? Isn't reflow just series of task retries? Could they be combined conceptually somehow to avoid the extra labels?

@hjoliver
Copy link
Member Author

hjoliver commented Aug 6, 2020

How do task retry and reflow relate? Isn't reflow just series of task retries? Could they be combined conceptually somehow to avoid the extra labels?

No, a reflow is essentially another instance of the workflow, triggered at a different point in the graph, managed by the same scheduler instance. Think of multiple "wave fronts" traveling along the graph concurrently.

Each reflow can have normal task retries within it. (I just tested this myself to make sure it works as advertised - and it does).

@TomekTrzeciak
Copy link
Contributor

TomekTrzeciak commented Aug 6, 2020

How do task retry and reflow relate? Isn't reflow just series of task retries? Could they be combined conceptually somehow to avoid the extra labels?

No, a reflow is essentially another instance of the workflow, triggered at a different point in the graph, managed by the same scheduler instance. Think of multiple "wave fronts" traveling along the graph concurrently.

Each reflow can have normal task retries within it. (I just tested this myself to make sure it works as advertised - and it does).

That sounds quite complicated (another dimension to think about on top of everything else interacting in the workflow). Are these flow labels user visible or just purely internal implementation detail?

Can one reflow only a part of the graph (set some reflow stopping criteria or stop it manually)?

@hjoliver
Copy link
Member Author

hjoliver commented Aug 6, 2020

That sounds quite complicated

Not really.

Firstly, you don't have to use reflow (and it won't happen by default).

Secondly, it's what should happen: if you manually trigger a task and it generates outputs, the downstream tasks that depend on those outputs should follow on as normal (and so on after those tasks).

The flow labels for now are only visible via the log, but we'll probably want to expose them in the UI somehow if multiple flows are in process at once. Users will need the label if they want to stop that flow from continuing, for example.

Can one reflow only a part of the graph (set some reflow stopping criteria or stop it manually)?

Yes. you can trigger any task with --reflow, and what happens next depends on what the graph says. So a reflow will come to a natural end if you triggered a sub-graph that does not spawn into a new cycle point. Stop criteria: #3750

The documentation so far is: https://cylc.github.io/cylc-admin/proposal-spawn-on-d.html#reflow

Example use case: re-run a whole product-generation sub-tree from a previous cycle point (after changing some input data manually, say) simply by re-triggering the first task in that sub-tree, while the main flow carries on unaffected.

@oliver-sanders
Copy link
Member

It would be nice to issue flow labels in order, it's more obvious whats going on when flow B overtakes flow A.

@hjoliver
Copy link
Member Author

We could do that, but it might be hard to maintain order after a while, in workflow with rampant use of reflows (note that flow merge can be incremental, and in general you can't predict which of multiple flows might end up being "the one" that carries on after others have stopped or merged ... but I suppose we can always choose the next label in an ordered list, rather than a random one, for the next flow, even if the list of currently-unused labels has some holes in it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants