Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve interaction between task state, output and prerequisite #2329

Open
matthewrmshin opened this issue Jun 22, 2017 · 17 comments
Open

Improve interaction between task state, output and prerequisite #2329

matthewrmshin opened this issue Jun 22, 2017 · 17 comments
Labels
efficiency For notable efficiency improvements
Milestone

Comments

@matthewrmshin
Copy link
Contributor

matthewrmshin commented Jun 22, 2017

Continue of #1794 and #2157. See also #1392 and #2348.

@matthewrmshin matthewrmshin added this to the later milestone Jun 22, 2017
@matthewrmshin matthewrmshin self-assigned this Jun 22, 2017
@matthewrmshin matthewrmshin added the efficiency For notable efficiency improvements label Jun 22, 2017
@matthewrmshin
Copy link
Contributor Author

Initial thoughts:

  • Task pool to hold a dict of available (and relevant) output objects in memory.
  • Prerequisites of each task proxy will simply be a list of output objects selected from the main dict.
  • On receiving task messages, the output objects will be updated, and task prerequisites will be satisfied automatically.

@hjoliver
Copy link
Member

hjoliver commented Jul 9, 2017

See also #1689. The bulk of the content of a task proxy object is only needed at job submit time, to generate the job script. Aside from that, in memory we only need prerequisites/outputs (for scheduling) and submit number (and timers...). Your proposal seems to be going some way toward this?

@matthewrmshin
Copy link
Contributor Author

Yes, I am hoping that the task proxy object will become as light weight as possible.

@hjoliver
Copy link
Member

hjoliver commented Mar 12, 2018

Note #2600 (comment) through #2600 (comment) - regarding persistence of prerequisite state and whether or not task and prerequisite state should be allowed to diverge.

@hjoliver
Copy link
Member

hjoliver commented Mar 14, 2018

@matthewrmshin - I really like the simplicity of your proposed implementation above. But we need to consider the implications, e.g. for task state, in light of manual triggering requirements etc. I'll come back with some thoughts on this...

@hjoliver
Copy link
Member

hjoliver commented Mar 14, 2018

This - by definition - makes task prerequisites and outputs absolutely consistent. (There's still a housekeeping problem: how to determine when each output can be forgotten and dropped from the dict, but presumably that's solvable).

But I think we still need task state and prerequisite/output state to be "divorceable" because we cannot assume that if a task is in a post-triggered state that it's prerequisites are/were satisfied. E.g. if on manually triggering a task, or resetting it to a post-triggered state, forcing its prerequisites to be artificially completed would (under this proposal) force the outputs of its depended-on tasks to be artificially completed, which might have unintended consequences for other tasks that also depend on those. [or, would that be OK?! ... or it could be optional: cylc trigger --complete-prereqs?]

So, it seems to me, if a task gets manually triggered or reset, we should automatically complete its outputs (which will complete the prerequisites of downstream tasks that depend on it - but that is what's needed) but not its prerequisites [unless optionally, as per prev paragraph].

@matthewrmshin
Copy link
Contributor Author

On housekeep. My most naive assumption is that we can do what we do now with task proxies, i.e. we'll housekeep any outputs that can no longer be prerequisites of any downstream tasks.

On prerequisites. I think the key concept of the proposal here is the separation of task states and prerequisite-output objects. An action/event on a task can only affect its outputs, but should have no effect on its prerequisites.

See also #1314.

@hjoliver
Copy link
Member

On housekeep: agreed, the same logic should work.

On prerequisites: also agreed - I just mentioned this in light of the related discussion on #2600 about whether or not prerequisite state can also be consistent with, or inferred from, task state. The conclusion would appear to be no, it can't.

@TomekTrzeciak
Copy link
Contributor

TomekTrzeciak commented Mar 14, 2018

I might be missing something, but why would the prerequisites ever need to be artificially set to completed? If the task gets manually triggered, you could equally well have some flag to signify this fact rather than changing the state of its prerequisites. Isn't the very purpose of the manual trigger to run a task regardless of the state of its prerequisites?

Regarding outputs housekeep - to avoid writing your own garbage collector for that you could keep the outputs in WeakValueDictionary. As soon as all task proxies that hold or reference (via one of the prerequisites) a given output are housekept, python would be free to garbage collect that dictionary entry.

@hjoliver
Copy link
Member

hjoliver commented Mar 14, 2018

@TomekTrzeciak - well then, apologies - I guess I misunderstood your comments here: #2600 (comment). I thought you were suggesting that (a) task state and prerequisite states should not diverge. and (b) prerequisites could be reconstructed from outputs - of the same task (like what we do already from task state on restart). However, on (b) at least I now see you probably did not mean "of the same task"!

Interesting idea on the housekeeping, sounds promising.

@hjoliver
Copy link
Member

hjoliver commented Mar 14, 2018

Just to note, we do currently have one important use-case for artificially setting prerequisites: reset to waiting sets the task state to waiting and its prerequisites to not-satisfied (to force them to get satisfied again, which might involve waiting on re-running upstream tasks) - under this proposal setting the prerequisites to not-satisfied would set upstream task outputs to not-completed. However, I think just resetting the task state to waiting will do. Then the task will either trigger again immediately (if its prerequisites are still satisfied) or wait (if the upstream tasks have been retriggered or reset, which will unset their outputs). In fact, that seems more sensible than what we're currently doing.

@TomekTrzeciak
Copy link
Contributor

@hjoliver, sorry for the confusion, but from your latest comments I reckon we are pretty much on the same page now. With prerequisites reading rather than holding the state of other tasks' outputs, I think that the mental model of task behaviour (and the code too) will become simpler.

@hjoliver
Copy link
Member

Yes agreed - and I concede that my original implementation of prerequisites and outputs (which dates back a rather long time) lacked a certain purity of thought!

@hjoliver
Copy link
Member

hjoliver commented May 8, 2018

@matthewrmshin - I think this essentially supersedes #1902, no? There would be no need to retain succeeded task proxies if their outputs are held in the new outputs dict (so long as we also solve #2143). (And this should also mean no need to even consider satisfying prerequisites from the DB, as per #1428).

@matthewrmshin
Copy link
Contributor Author

Yes, I think this supersedes #1902 (and #1392?).

@hjoliver
Copy link
Member

hjoliver commented May 8, 2018

Just to note: when this issue gets done, we need to check that #1392 is solved.

@hjoliver
Copy link
Member

This issue needs to be re-evaluated post #3515 (spawn on demand): dependency matching is no longer relevant, but it might still be possible to handle prerequisites and outputs more cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

No branches or pull requests

3 participants