reflow #468

oliver-sanders · 2020-06-29T11:49:19Z

SoD (cylc/cylc-flow#3515) brings with it the concept of reflow, whereby a single running workflow can have multiple parallel executions which can be stopped (or potentially held?) independently and merge back together.

This amazing functionality is going to be an interesting design challenge for the UI:

Reflows will likely have metadata associated with them (e.g. a title or description).
Theoretically there can be many reflows, though normally there will probably only be the original flow and perhaps one reflow.
Different (re)flows can merge together if one catches up with the other.
When a user initiates a reflow from the UI it would be good if we could visually display the impact this will have on a workflow (e.g. by highlighting all of the tasks which would get rerun) - note this requires graph traversal (at the scheduler?) not yet implemented.

Initial thoughts:

Two task/family proxies with the same name and cycle point but different (re)flow ID can exist at the same time.
This fits quite nicely with other visual filtering plans (awaiting sketch).
Users will likely want to filter by (re)flow in their views (e.g. show [re]flows in different tabs).

Questions:

What kind of visual representation do we want for this (e.g. parallel, composite, branched, other)? - Most likely composite
Should we open up the ability to view (re)flows independently (e.g. using subgraphs or by filtering in the tree view)?
How do we convey that a task is held in one flow but not in another? - we probably won't allow this
How should we convey that a "suite" has multiple flows, one or more of which are held?

Pull requests welcome!

dwsutherland · 2020-07-01T02:32:02Z

Two task/family proxies with the same name and cycle point but different (re)flow ID can exist at the same time.

Only if we need history of proxies? I think the latest flow will overwrite the old (with job submit incrementing sequentially across flows)

Also, If true would imply flow specific edges and probably flow specific data-stores... But I think we backed off of this, as people can already run multiple workflows.

hjoliver · 2020-07-01T02:39:26Z

Two task/family proxies with the same name and cycle point but different (re)flow ID can exist at the same time.

@dwsutherland is right - this would result in a single proxy with a merged flow ID.

kinow · 2020-07-01T03:37:33Z

That will be interesting for the UI. We will have an HTML element attached to the DOM representing the proxy and job with some data (progress/status/started time/etc). Then I think the UI will receive a delta either for added or updated with the new task proxy.

We will have to then merge the data, and display only once task proxy?

Will the user be aware that the one visible in the UI has been updated by a reflow?

dwsutherland · 2020-07-01T03:56:26Z

A flow badge(s) that appears on nodes for any non-default flow label?

oliver-sanders · 2020-07-01T07:32:51Z

When we have two flows how does the status of a task change depending on the flow you are viewing.

E.g. for the flow a => b => c => d.

If we run from a to c then reflow from a, then:

b will appear in the n=1 window.
b will be succeeded in the first flow.
b does not yet exist in the second flow?

So long as the second flow doesn't result in the creation of waiting tasks this will still be comprehendable-ish from the UI. You will just see completed tasks being rerun which makes sense.

Otherwise we will need a full n=1 window for each flow?

dwsutherland · 2020-07-02T21:45:58Z

So long as the second flow doesn't result in the creation of waiting tasks this will still be comprehendable-ish from the UI

Not sure I understand the conflict. The second flow will create a waiting b that replaces the first flow b (that is no longer in the pool anyway).

hjoliver · 2020-07-03T01:32:32Z

Yeah a reflow really just extends the concept of retrigger - so we already had the same problem (UI-wise) with multiple submits of a task. Now instead of a just a new submit number, we get a new flow label and a new submit number (and unlike old-style retrigger, the flow continues downstream from the retriggered task).

So the n-distance window is kind of agnostic to flow. It just has tasks in it, and those takes have whatever flow labels they have, just like submit numbers.

Presumably by default the UI should show, for a particular task, the latest submit/flow that occurred (luckily submit number increments linearly so that makes that easy) and it could highlight somehow (a flow badge?) what flow that task belongs to.

If we want to be able to filter by flow, regardless of latest submit number, that's slightly more interesting but it should work fine I think.

oliver-sanders · 2020-07-03T09:49:02Z

Yeah a reflow really just extends the concept of retrigger

Well yes, that is true, however trigger only effects one task so it is much easier to understand it's effect from the displayed information.

The difficulty is making it clear to the user that there are two flows and what that entails for their workflows.

So reflow is much bigger problem than re-triggering individual tasks.

So long as the second flow doesn't result in the creation of waiting tasks this will still be comprehendable-ish from the UI

Not sure I understand the conflict. The second flow will create a waiting b that replaces the first flow b (that is no longer in the pool anyway).

It's a representation problem not a data problem, how the task pool and data store handle this is the domain of Cylc Flow and UI Server.

I'll try and explain the representation problem with a diagram. Here are three alternative representations of a reflow in graph form:

Composite:

There are other options of course, and the "composite" option has two variants:

Prioritise the earlier flow.
Prioritise the later flow.
Prioritise the most recent data.

Thanks to the task-job separation the composite representation is much easier to understand than it would be otherwise, however, we would require "visual filtering" (e.g. colour coding) to tell between one flow and another.

In the diagram I have shown (1), if the Scheduler inserts waiting task proxies for the second flow then I think we would get (2). (3) would just be confusing as heck.

Branched:

The branched option "looks" like the best, however, for the main expected use cases reflow isn't really a branching problem, as the user intends to overwrite the results of the first flow with the second so the branched representation is an un-necessary complication as these graphs could become very large making it extremely difficult to associate a task from one flow with the same task in another.

oliver-sanders · 2020-07-03T09:57:04Z

I think the kind of questions users will want answers to are:

How many flows are currently active?
Have those flows merged yet?
What tasks is flow (a) going to rerun?
Are flows (a) and (b) converging?

I think some nifty visual filtering can provide answers to these questions, I'll try and sketch something up soon...

hjoliver · 2020-07-05T21:15:17Z

I actually think your "composite" sketch is fine at least as first cut, perhaps with latest flow label attached to tasks. Then filtering should allow you to see particular flows (one or other) of your "parallel" sketch.

The composite view actually corresponds to my mental model, which is a single abstract graph that you can trigger real flows on in multiple places at once.

The trouble with representing different flows as entirely separate is that it might give the impression they are entirely independent rather than merging if they catch up with one another. Note that in less linear graphs merging happens gradually, and the merge point can't be really be anticipated before it happens.

oliver-sanders · 2020-07-06T12:44:54Z

I've not yet created any sketches for "visual filtering" so this is a little course, but as a rough guide, there would be a visual filtering dialogue which could be used to change the appearance of nodes based on different factors e.g. family, parameterisation, (re)flow, etc.

You would only be able to "visually filter" for one thing at a time (e.g. families OR (re)flows) and filtering can be toggled on or off independently for each tab.

When a new flow is created (providing the view doesn't have a pre-existing filter) we could automatically activate a "visual filter" for (re)flow, when they merge we can disable it.

oliver-sanders · 2020-07-06T12:47:43Z

A "composite" view in combination with "visual filtering" should work for most use cases I can think of as when we create a new flow we are anticipating/intending it to catch up and merge with an earlier flow. Parallel flow use-cases are not intended or supported?

If so the last nasty question is this:

How do we convey that a task is held in one flow but not in another?

hjoliver · 2020-07-06T23:42:37Z

I like the "visual filtering" idea, looks really good.

Would we still need (text) flow labels attached to tasks in the unflitered case though? Especially if there are (god forbid) a large number of flows.

Parallel flow use-cases are not intended or supported?

What do you mean by that? Disjoint parallel graph streams that will never merge?

How do we convey that a task is held in one flow but not in another?

Do we really need to allow that? The visualization works fine so long as a held task is held regardless of flow.

dwsutherland · 2020-07-07T08:46:57Z

It's a representation problem not a data problem, how the task pool and data store handle this is the domain of Cylc Flow and UI Server.

So what? Not a data problem as in the UI will separate the identical nodes by flow?

How do we convey that a task is held in one flow but not in another?

Well if the the UI separates the node deltas by flow, then problem solved?

However, if it actually needs to be represented in the data-store.. Then we would need to have a data-structure per flow, i.e.:

data = { "owner|workflow1": {"flow_id_a": data-structure, "flow_id_b": data-structure}, "owner|workflow2":  {"flow_id":. . .}}

oliver-sanders · 2020-07-07T10:35:59Z

So what? Not a data problem as in the UI will separate the identical nodes by flow?

The idea is that this is a design issue focused exclusively on the representation of reflow in the UI, how we get the required data to the UI is a matter for future issues in other repositories.

Would we still need (text) flow labels attached to tasks in the un-filtered case though?

Attaching text labels to nodes would be pretty ugly so if we can avoid this entirely that would be better.

What do you mean by that? Disjoint parallel graph streams that will never merge?

"Disjoint parallel graph streams" are confusing when represented in a composite graph, it's hard to tell what will run next. This is a case where a "parallel" view would make much more sense.

How do we convey that a task is held in one flow but not in another?

Do we really need to allow that?

From VC:

We will be able to hold flows at the task pool level.
- When do we regard a suite as being "held" - if one of its unmerged flows is held, or if all are held?
- How do we represent a suite with multiple flows, some of which are held, some of which arent - in the status bar?
We don't need to allow tasks to be held on a per-flow basis so we don't need to worry about the "task is held in one flow but not in another" issue.

hjoliver · 2025-01-08T02:09:24Z

Closed by #2016 - see #2016 (comment)

oliver-sanders added design question Flag this as a question for the next Cylc project meeting. labels Jun 29, 2020

oliver-sanders added this to the Pending milestone Jun 29, 2020

oliver-sanders changed the title ~~design: reflow~~ reflow Jul 9, 2020

oliver-sanders mentioned this issue Jul 7, 2021

Implement named flows, and improve task logging. cylc/cylc-flow#4287

Closed

8 tasks

MetRonnie mentioned this issue Dec 6, 2024

Show flow numbers #2016

Merged

6 tasks

hjoliver closed this as completed Jan 8, 2025

oliver-sanders modified the milestones: Pending, 2.7.0 Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reflow #468

reflow #468

oliver-sanders commented Jun 29, 2020 •

edited

Loading

dwsutherland commented Jul 1, 2020 •

edited

Loading

hjoliver commented Jul 1, 2020

kinow commented Jul 1, 2020

dwsutherland commented Jul 1, 2020

oliver-sanders commented Jul 1, 2020

dwsutherland commented Jul 2, 2020 •

edited

Loading

hjoliver commented Jul 3, 2020 •

edited

Loading

oliver-sanders commented Jul 3, 2020 •

edited

Loading

oliver-sanders commented Jul 3, 2020

hjoliver commented Jul 5, 2020 •

edited

Loading

oliver-sanders commented Jul 6, 2020 •

edited

Loading

oliver-sanders commented Jul 6, 2020

hjoliver commented Jul 6, 2020

dwsutherland commented Jul 7, 2020

oliver-sanders commented Jul 7, 2020

hjoliver commented Jan 8, 2025

reflow #468

reflow #468

Comments

oliver-sanders commented Jun 29, 2020 • edited Loading

dwsutherland commented Jul 1, 2020 • edited Loading

hjoliver commented Jul 1, 2020

kinow commented Jul 1, 2020

dwsutherland commented Jul 1, 2020

oliver-sanders commented Jul 1, 2020

dwsutherland commented Jul 2, 2020 • edited Loading

hjoliver commented Jul 3, 2020 • edited Loading

oliver-sanders commented Jul 3, 2020 • edited Loading

oliver-sanders commented Jul 3, 2020

hjoliver commented Jul 5, 2020 • edited Loading

oliver-sanders commented Jul 6, 2020 • edited Loading

oliver-sanders commented Jul 6, 2020

hjoliver commented Jul 6, 2020

dwsutherland commented Jul 7, 2020

oliver-sanders commented Jul 7, 2020

hjoliver commented Jan 8, 2025

oliver-sanders commented Jun 29, 2020 •

edited

Loading

dwsutherland commented Jul 1, 2020 •

edited

Loading

dwsutherland commented Jul 2, 2020 •

edited

Loading

hjoliver commented Jul 3, 2020 •

edited

Loading

oliver-sanders commented Jul 3, 2020 •

edited

Loading

hjoliver commented Jul 5, 2020 •

edited

Loading

oliver-sanders commented Jul 6, 2020 •

edited

Loading