-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
n-distance data-store graph window generation/pruning #3811
Conversation
Cylc Tui is looking much better 👍 I noticed that waiting tasks that are ready to run are considered to be active which confused me slightly at first but kinda makes sense as they are quite transitory. |
Yeah, I call the window creation on any task added to the runahead pool ... But this could be done elsewhere. We can actually make n_max configurable with a mutation.. It will mean the front end can effect the data-store size on the back end workflow server, but perhaps we can set limits according authorization (i.e. read only n=0, ... , full control n=?) |
I think n=0 (if that's what you're talking about) should include all tasks in the main scheduler pool, but not the runahead pool. That will include "active waiting" tasks, as I think I called them in the SoD proposal: tasks that have at least one, but not all prerequisites satisfied (they are spawned by the first upstream, output that they depend on). Note these aren't necessarily transitory @oliver-sanders . As also discussed in the proposal, we could decide these waiting tasks are not "active" (i.e. not in n=0) but I don't think it matters too much right now. |
Remember the scheduler only needs to store n=0, but we decided it can keep up to n=1 for the convenience of So the front end should determine the n-size of the UIS datastore, but not the scheduler datastore. We could optionally allow the scheduler to store arbitrary n windows, up to some max, if you want to see loads of tasks in |
016e1b0
to
d58fa81
Compare
Done. |
It might be a little bit difficult explaining the concept of "active waiting" tasks to users, it's easier to explain the n=x window as being x edges out from an active task where active is {preparing,submitted,running}, however, it's not that big an issue. |
Yeah I agree. As argued in the SoD proposal, spawning of these as waiting task proxies doesn't matter - users will see a "waiting" task out front whether or not it is backed by a task proxy in the scheduler; but they probably shouldn't appear in n=0. New Issue: #3822 |
We can place the |
The hard challenge I have at the moment is working out an efficient pruning scheme.. (I have a couple of things in mind, bound to evolve as I attempt) |
We can have a brainstorming session about that, so it's not all on you. |
1a18a3e
to
bd317f6
Compare
I've added a pruning mechanism that satisfies the need to remove potential future workflow paths that are not taken
). There is still a little buggy behaviour, so I'll need to tidy things up somewhat. |
The other issue is; I've had to revert back to n=0 window (node/edge creation) to be triggered on tasks being added/created to the runahead pool.. The reason is; tasks get removed before their children get released from the runahead pool, so creating nodes of the release from runahead creates a gap in the data-store (i.e. children get removed as a potential workflow path that was never followed, and have to get recreated again)... So the only other options:
I'm leaning towards option 2, as it's just a set operation/lookup, and nodes are soon to know about parents to n-max anyway... Watch this space... |
bd317f6
to
9097601
Compare
@dwsutherland - not sure if you got any further with this yet, perhaps we can have a chat tomorrow? My initial thought was the first order attempt might be something like this:
This would be super-easy and it might be enough, at least for low-ish n windows, because following edges is pretty easy. Then the challenge would be to beat this with an efficient incremental pruning scheme (i.e. if a task leaves n=0, can we decide what tasks to remove from n=N window without recomputing from scratch again). But in the meantime, we would have a working system 😁 - what do you think? (Maybe there's a flaw in my logic, or you already have a better schemed lined up). |
@hjoliver - Yeah I already have something more efficient in place, but I've got another idea in the works which avoids knowing when something has left the pool (i.e. knows when a child at n=N becomes n=0), and does a diff to prune (which should also hit paths not taken)... Anyway I'm just about implement this. |
a7bab88
to
851015a
Compare
161cd41
to
193bfba
Compare
2e475b0
to
ef40e92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have played with various workflow graphs at several n values, and this seems to do the right thing - a nice improvement 👍
Noticed scheduler reload wipes out jobs from the datastore (probably not just on this branch though; fix with a follow-up?)
Some follow-ups already planned (e.g. waiting-but-held-back tasks in n=1 not n=0) and some other tweaks might be needed in due course, but I think this is a good basis for n-windows in the scheduler.
I did a pretty once-over-lightly code review, but it looked fine at that level ... so let's keep the ball rolling here.
I've not done an especially deep code review, I think I have a vague idea of what's going on, I know @dwsutherland and @hjoliver have debated the implementation to death so I'm happy to pass over that. I've had a play with a few graphs and it seems to be giving me the expected results. I've found two things which were unexpected to me, though I may just be demonstrating my ignorance of the discussions over recent weeks.
This capture shows (2) [task
If these things are known about, or if I'm just really out of date please merge! |
Huh, I was expecting that, and that we'd just have to make it work with the UIS. If I'm viewing a flow in the UI with the default n=1 window, and want to see more tasks (n=4 say), surely I shouldn't have to restart my UIS to achieve that?
I also noticed this but forgot to comment on it (definitely not intended from my perspective, but you didn't miss a discussion on it). Fix as a follow-up is fine.
@dwsutherland can confirm but I think that's because you're holding task |
So, no merge-blockers there I think, but will wait for @dwsutherland to respond as well. |
Actually, I don't think it will mess up the UIS, because the boundary pruning isn't effected by it (only new nodes register the new boundary, so each node "knows" when it should be pruned in effect) .. And even if they get out of sync, the UIS will reconcile (via the stamp). (though in the future they may be independent)
Huh? tasks before the initial cycle point? I haven't seen this (maybe I'm not running the right flows)
Correct, if it's in the pool then it's |
Oh, it's the integer cycling, as the following doesn't have that issue:
Will have a look. |
Hmm.. This works fine:
So might be the trigger logic |
Yes it seems to only happen with intercycle dependence. But the pre-initial task proxies aren't appearing in the scheduler task pool, right? So is it something to do with cycling sequence methods accessed by the datastore? |
ef40e92
to
553f1a0
Compare
The cycle point needed to be checked that it is on sequence and valid, for iso8601 sequences the Have switched to |
@oliver-sanders (BTW changing the window size doesn't require a reload.. Reloads will wipe and restart the whole data-store) |
Checked it works. Thanks @dwsutherland 🎉 💐 |
These changes close #3747
Closes cylc/cylc-ui#496
N-Distance node/edge generation:
The cycle point nodes/edges (i.e. task/family proxies) generation is triggered
individually on transition from staging to active task pool. Each active task
is generated along with any children and parents out recursively out to a
specified maximum graph distance (n_edge_distance), that can be externally
altered (via API). Collectively this forms the N-Distance-Window on the
workflow graph.
N-Distance nodes/edges pruning:
Pruning of data-store elements is done using both the collection/set of nodes
generated through the associated graph paths of the the active nodes and the
tracking of the boundary nodes (n_edge_distance+1) of those active nodes.
Once a boundary node becomes active the original node is flagged for prunning.
Set operations are used to do a diff between the nodes of active paths
(paths whose node is in the active task pool) and the nodes of flagged paths
(whose boundary node(s) have become active).
example one
(
*
as active).. Withn=1
windowWith
a
active we haveIn paths
boundary info at n+1
When
c
becomes active we have:(yes
d
is the prune trigger for itself andc
, because it is the furthest distance beforen+1
... this works for isolates too)and since
c
became active we flag the now out of windowa
for pruning..The pruning mechanism will then take all the active path nodes:
{b, c, d}
and do a diff with all the inactive path nodes:
{a, b} + {a, b, c}
so we are cleared to prune
{a}
, resulting in:example two
(
*
as active).. Withn=1
windowWhen
b
&g
turned active,e
is pruned but nota
, asa
is in the path of activeb
(even thoughg
flagged it), then whenc
becomes activea
is pruned.example three
Pruning works here because when
c
becomes active, the paths ofa
includef
, sof
is pruned (as long as it's not in the active task pool)..(other convoluted graphs with conditionals and absolute triggers are also handled by this scheme)
So when we generate the window about an active task, we also find/associate the conditions for the pruning of/with it.
This ensures a change of window size won't orphan the task from being pruned, which would trip up alternate pruning schemes.. i.e reducing the window size using a scheme based flagging the ancestors/parents of active tasks would miss nodes formerly in the window.
Animated examples:
Given the following graph with simple chain, and isolate branch:
At
n=0
we have:And
n=1
:But we can also alter this with a mutation:
Need to:
n=1
is default, but perhaps we can setn=0
dynamically when the active pool gets beyond some configurable limit.(future PRs?)
Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.