Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflow #468

Open
4 tasks
oliver-sanders opened this issue Jun 29, 2020 · 15 comments
Open
4 tasks

reflow #468

oliver-sanders opened this issue Jun 29, 2020 · 15 comments
Labels
design question Flag this as a question for the next Cylc project meeting.
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Jun 29, 2020

SoD (cylc/cylc-flow#3515) brings with it the concept of reflow, whereby a single running workflow can have multiple parallel executions which can be stopped (or potentially held?) independently and merge back together.

This amazing functionality is going to be an interesting design challenge for the UI:

  • Reflows will likely have metadata associated with them (e.g. a title or description).
  • Theoretically there can be many reflows, though normally there will probably only be the original flow and perhaps one reflow.
  • Different (re)flows can merge together if one catches up with the other.
  • When a user initiates a reflow from the UI it would be good if we could visually display the impact this will have on a workflow (e.g. by highlighting all of the tasks which would get rerun) - note this requires graph traversal (at the scheduler?) not yet implemented.

Initial thoughts:

  • Two task/family proxies with the same name and cycle point but different (re)flow ID can exist at the same time.
  • This fits quite nicely with other visual filtering plans (awaiting sketch).
  • Users will likely want to filter by (re)flow in their views (e.g. show [re]flows in different tabs).

Questions:

  • What kind of visual representation do we want for this (e.g. parallel, composite, branched, other)? - Most likely composite
  • Should we open up the ability to view (re)flows independently (e.g. using subgraphs or by filtering in the tree view)?
  • How do we convey that a task is held in one flow but not in another? - we probably won't allow this
  • How should we convey that a "suite" has multiple flows, one or more of which are held?

Pull requests welcome!

@oliver-sanders oliver-sanders added design question Flag this as a question for the next Cylc project meeting. labels Jun 29, 2020
@oliver-sanders oliver-sanders added this to the Pending milestone Jun 29, 2020
@dwsutherland
Copy link
Member

dwsutherland commented Jul 1, 2020

Two task/family proxies with the same name and cycle point but different (re)flow ID can exist at the same time.

Only if we need history of proxies? I think the latest flow will overwrite the old (with job submit incrementing sequentially across flows)

Also, If true would imply flow specific edges and probably flow specific data-stores... But I think we backed off of this, as people can already run multiple workflows.

@hjoliver
Copy link
Member

hjoliver commented Jul 1, 2020

Two task/family proxies with the same name and cycle point but different (re)flow ID can exist at the same time.

@dwsutherland is right - this would result in a single proxy with a merged flow ID.

@kinow
Copy link
Member

kinow commented Jul 1, 2020

That will be interesting for the UI. We will have an HTML element attached to the DOM representing the proxy and job with some data (progress/status/started time/etc). Then I think the UI will receive a delta either for added or updated with the new task proxy.

We will have to then merge the data, and display only once task proxy?

Will the user be aware that the one visible in the UI has been updated by a reflow?

@dwsutherland
Copy link
Member

A flow badge(s) that appears on nodes for any non-default flow label?

@oliver-sanders
Copy link
Member Author

When we have two flows how does the status of a task change depending on the flow you are viewing.

E.g. for the flow a => b => c => d.

If we run from a to c then reflow from a, then:

  • b will appear in the n=1 window.
  • b will be succeeded in the first flow.
  • b does not yet exist in the second flow?

So long as the second flow doesn't result in the creation of waiting tasks this will still be comprehendable-ish from the UI. You will just see completed tasks being rerun which makes sense.

Otherwise we will need a full n=1 window for each flow?

@dwsutherland
Copy link
Member

dwsutherland commented Jul 2, 2020

So long as the second flow doesn't result in the creation of waiting tasks this will still be comprehendable-ish from the UI

Not sure I understand the conflict. The second flow will create a waiting b that replaces the first flow b (that is no longer in the pool anyway).

@hjoliver
Copy link
Member

hjoliver commented Jul 3, 2020

Yeah a reflow really just extends the concept of retrigger - so we already had the same problem (UI-wise) with multiple submits of a task. Now instead of a just a new submit number, we get a new flow label and a new submit number (and unlike old-style retrigger, the flow continues downstream from the retriggered task).

So the n-distance window is kind of agnostic to flow. It just has tasks in it, and those takes have whatever flow labels they have, just like submit numbers.

Presumably by default the UI should show, for a particular task, the latest submit/flow that occurred (luckily submit number increments linearly so that makes that easy) and it could highlight somehow (a flow badge?) what flow that task belongs to.

If we want to be able to filter by flow, regardless of latest submit number, that's slightly more interesting but it should work fine I think.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jul 3, 2020

Yeah a reflow really just extends the concept of retrigger

Well yes, that is true, however trigger only effects one task so it is much easier to understand it's effect from the displayed information.

The difficulty is making it clear to the user that there are two flows and what that entails for their workflows.

So reflow is much bigger problem than re-triggering individual tasks.

So long as the second flow doesn't result in the creation of waiting tasks this will still be comprehendable-ish from the UI

Not sure I understand the conflict. The second flow will create a waiting b that replaces the first flow b (that is no longer in the pool anyway).

It's a representation problem not a data problem, how the task pool and data store handle this is the domain of Cylc Flow and UI Server.

I'll try and explain the representation problem with a diagram. Here are three alternative representations of a reflow in graph form:

reflow

Composite:

There are other options of course, and the "composite" option has two variants:

  1. Prioritise the earlier flow.
  2. Prioritise the later flow.
  3. Prioritise the most recent data.

Thanks to the task-job separation the composite representation is much easier to understand than it would be otherwise, however, we would require "visual filtering" (e.g. colour coding) to tell between one flow and another.

In the diagram I have shown (1), if the Scheduler inserts waiting task proxies for the second flow then I think we would get (2). (3) would just be confusing as heck.

Branched:

The branched option "looks" like the best, however, for the main expected use cases reflow isn't really a branching problem, as the user intends to overwrite the results of the first flow with the second so the branched representation is an un-necessary complication as these graphs could become very large making it extremely difficult to associate a task from one flow with the same task in another.

@oliver-sanders
Copy link
Member Author

I think the kind of questions users will want answers to are:

  • How many flows are currently active?
  • Have those flows merged yet?
  • What tasks is flow (a) going to rerun?
  • Are flows (a) and (b) converging?

I think some nifty visual filtering can provide answers to these questions, I'll try and sketch something up soon...

@hjoliver
Copy link
Member

hjoliver commented Jul 5, 2020

I actually think your "composite" sketch is fine at least as first cut, perhaps with latest flow label attached to tasks. Then filtering should allow you to see particular flows (one or other) of your "parallel" sketch.

The composite view actually corresponds to my mental model, which is a single abstract graph that you can trigger real flows on in multiple places at once.

The trouble with representing different flows as entirely separate is that it might give the impression they are entirely independent rather than merging if they catch up with one another. Note that in less linear graphs merging happens gradually, and the merge point can't be really be anticipated before it happens.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jul 6, 2020

I've not yet created any sketches for "visual filtering" so this is a little course, but as a rough guide, there would be a visual filtering dialogue which could be used to change the appearance of nodes based on different factors e.g. family, parameterisation, (re)flow, etc.

You would only be able to "visually filter" for one thing at a time (e.g. families OR (re)flows) and filtering can be toggled on or off independently for each tab.

When a new flow is created (providing the view doesn't have a pre-existing filter) we could automatically activate a "visual filter" for (re)flow, when they merge we can disable it.

reflow-visual-filtering

@oliver-sanders
Copy link
Member Author

A "composite" view in combination with "visual filtering" should work for most use cases I can think of as when we create a new flow we are anticipating/intending it to catch up and merge with an earlier flow. Parallel flow use-cases are not intended or supported?

If so the last nasty question is this:

How do we convey that a task is held in one flow but not in another?

@hjoliver
Copy link
Member

hjoliver commented Jul 6, 2020

I like the "visual filtering" idea, looks really good.

Would we still need (text) flow labels attached to tasks in the unflitered case though? Especially if there are (god forbid) a large number of flows.

Parallel flow use-cases are not intended or supported?

What do you mean by that? Disjoint parallel graph streams that will never merge?

How do we convey that a task is held in one flow but not in another?

Do we really need to allow that? The visualization works fine so long as a held task is held regardless of flow.

@dwsutherland
Copy link
Member

It's a representation problem not a data problem, how the task pool and data store handle this is the domain of Cylc Flow and UI Server.

So what? Not a data problem as in the UI will separate the identical nodes by flow?

How do we convey that a task is held in one flow but not in another?

Well if the the UI separates the node deltas by flow, then problem solved?

However, if it actually needs to be represented in the data-store.. Then we would need to have a data-structure per flow, i.e.:

data = { "owner|workflow1": {"flow_id_a": data-structure, "flow_id_b": data-structure}, "owner|workflow2":  {"flow_id":. . .}}

@oliver-sanders
Copy link
Member Author

So what? Not a data problem as in the UI will separate the identical nodes by flow?

The idea is that this is a design issue focused exclusively on the representation of reflow in the UI, how we get the required data to the UI is a matter for future issues in other repositories.

Would we still need (text) flow labels attached to tasks in the un-filtered case though?

Attaching text labels to nodes would be pretty ugly so if we can avoid this entirely that would be better.

What do you mean by that? Disjoint parallel graph streams that will never merge?

"Disjoint parallel graph streams" are confusing when represented in a composite graph, it's hard to tell what will run next. This is a case where a "parallel" view would make much more sense.

How do we convey that a task is held in one flow but not in another?

Do we really need to allow that?

From VC:

  • We will be able to hold flows at the task pool level.
    • When do we regard a suite as being "held" - if one of its unmerged flows is held, or if all are held?
    • How do we represent a suite with multiple flows, some of which are held, some of which arent - in the status bar?
  • We don't need to allow tasks to be held on a per-flow basis so we don't need to worry about the "task is held in one flow but not in another" issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design question Flag this as a question for the next Cylc project meeting.
Projects
None yet
Development

No branches or pull requests

4 participants