Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prerequisite and output manipulation on state changes. #2600

Merged
merged 6 commits into from
Apr 13, 2018

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Mar 12, 2018

Close #2599

Address a problem first revealed in #2561 (no one had tried to use task prerequisites for external purposes until now, I guess).

Principally, at certain task state changes prerequisites were being set to all-met or all-not-met. This was at best unnecessary and at worst wrong: in conditional triggers, or manual triggering, tasks can submit without all prerequisites being met.

On this branch:

  • prerequisites are only unset on task state reset to 'waiting'. Otherwise they're left alone (e.g. if manually triggered, any un-met prerequisites now remain un-met).
  • outputs on the other hand are manipulated to stay consistent with state changes (otherwise triggering of downstream tasks would not work).

Also:

  • disallows manual reset to ready (this state is really a detail of internal implementation - users should use cylc trigger) and held (users should use cylc hold, which properly handles swap states)
    See more detailed comments in TaskState.reset_state() and the bin/cylc-reset usage string.

Tests based on #2599 (comment)

@hjoliver hjoliver added this to the soon milestone Mar 12, 2018
@hjoliver hjoliver self-assigned this Mar 12, 2018
@hjoliver
Copy link
Member Author

A remaining problem in the same vein: on restart we now load the outputs of each task from the run DB, but we still infer the state of prerequisites from the task state (e.g. if a task with state >= "submitted" must have all prerequisites met... which is not true: it could have conditional prerequisites or have been manually triggered). So - @cylc/core - IMO we need a prerequisites table in the run DB - if agreed, I'll post a new issue for this.

@matthewrmshin
Copy link
Contributor

I agree - as long as it does not make it harder to implement #2329 in the future.

@hjoliver
Copy link
Member Author

Yeah, perhaps this should wait on #2329 in that case - I'll just add a note to that issue.

@TomekTrzeciak
Copy link
Contributor

Isn't it possible to somehow restore prerequisite state based on the task outputs loaded from the DB? I worry that adding prerequisite table in DB would just create more possibilities for prerequisites and actual task states to diverge (more places in the code to keep in sync) and will likely become superfluous once #2329 gets done. Perhaps it's better to focus energy on #2329, which addresses the problem at its source.

@sadielbartholomew
Copy link
Collaborator

I have a few thoughts (which may be misguided - I've tried to get my head around the matter & related issues but with limited experience I don't claim to fully understand the technicalities):

  1. Are manual & conditional triggers the only cases whereby tasks can be submitted without having their prerequisites satisfied? If they are, considering the general aim to keep things as 'light-weight' as possible, might it be best to only record & use prerequisites for tasks triggered as such instead of every task? We could treat tasks triggered in those ways as special cases in which a prerequisites table would need to be consulted, but otherwise know that the prerequisite state is safe to infer from the task state. Then if a user (as in Add information about prerequisites to task environment #2555) wanted to know about prerequisites that triggered specific tasks, we could add the data relevant to those tasks (only)? It seems unnecessary to record all prerequisite data when there are only distinct cases it is needed (unless it constitutes progression towards other broader/future aims).

  2. Have all of the possible complexities been considered for a prerequisites table in the run DB? From having skimmed through the user guide, there are some dependency features which are supported that may (?) introduce complications, e.g:

  • future triggering & triggering off tasks yet to exist & off tasks from other suites: how to account for such prerequisite entries;
  • suicide triggering: lost data from 'removal' of tasks?

If I've evidently completely misunderstood something please just say; at least I can get clarification :)

@hjoliver
Copy link
Member Author

hjoliver commented Mar 14, 2018

[deleted two of my own comments after some rethinking]

@hjoliver
Copy link
Member Author

@TomekTrzeciak -

Isn't it possible to somehow restore prerequisite state based on the task outputs loaded from the DB?

That should be possible in principle, but I'm not sure how easy it would be, and conditional prerequisites completed after a task triggered would muddy the waters somewhat (c.f. your stated goal of wanting to know which prerequisites actually caused the task to trigger). Best to revisit after #2329 I think.

@hjoliver
Copy link
Member Author

hjoliver commented Mar 14, 2018

@sadielbartholomew -

  1. that's quite a good idea, but it's not just triggering that's the problem. There's also manual state reset - e.g. you can force a task to "succeeded" even though it never ran at all. Again, we should rethink all this in light of Improve interaction between task state, output and prerequisite #2329
  2. and 3. these are probably not a problem, because all such prerequisites in the end reduce to a simple string with an associated boolean satisfied/not-satisfied status.

@hjoliver
Copy link
Member Author

The current conversation on persistence or restoration of prerequisite state after a restart is kind of off-topic for this PR - we should continue if necessary under #2329.

Under the current implementation it is still worth finishing this off this PR though, to make #2561 viable.

@hjoliver hjoliver force-pushed the task-state-reset-fix branch 2 times, most recently from 3311ea8 to a530b2c Compare March 15, 2018 08:58
@hjoliver hjoliver changed the title Minimal prerequisite and output manipulation on state reset. Fix prerequisite and output manipulation on state changs. Mar 15, 2018
@hjoliver
Copy link
Member Author

[updated PR description]

@hjoliver hjoliver force-pushed the task-state-reset-fix branch 3 times, most recently from f63362f to 3260103 Compare March 15, 2018 20:52
@hjoliver hjoliver changed the title Fix prerequisite and output manipulation on state changs. Fix prerequisite and output manipulation on state changes. Mar 15, 2018
@hjoliver hjoliver requested a review from matthewrmshin March 15, 2018 22:42
@hjoliver hjoliver added the bug Something is wrong :( label Mar 15, 2018
@hjoliver hjoliver modified the milestones: soon, next release Mar 15, 2018
@hjoliver
Copy link
Member Author

hjoliver commented Mar 15, 2018

@TomekTrzeciak - I tried to assign you as a second reviewer on this, since it's closely tied to your PR, however I think I need to invite you to be a member of the "cylc" group to do that (just done).

if output not in [TASK_OUTPUT_EXPIRED,
TASK_OUTPUT_SUBMIT_FAILED,
TASK_OUTPUT_FAILED]:
msg += "\n " + output
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on this branch tasks retain these "alternate standard outputs" permanently (but normally in the non-completed state). They were being added and removed on the fly, which was messy, just to avoid this log message on normal successful task completion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably adds a tiny amount to the memory footprint, but it is the right thing to do IMO.

if status in [TASK_STATUS_FAILED,
TASK_STATUS_SUBMIT_FAILED]:
# TODO - HUH? SUBMIT_FAILED? WHAT ABOUT SUCCEEDED?
itask.set_event_time('finished',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious, indeed. Needs some more explanation if this is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably include all states in cylc.task_state.TASK_STATUSES_FINAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oops, looks like I forgot to come back to my own reminder there!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Turns out set_event_time merely updates the state summary times shown in the GUI - nothing to do with "events" as such. So I've changed the method name, and only update the "finished" time for succeeded and failed tasks.

@@ -398,7 +390,7 @@ def _set_state(self, status):
message += " (%s)" % self.hold_swap
LOG.debug(message, itask=self.identity)

def is_greater_than(self, status):
def is_gt(self, status):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why status_leq and status_geq are module functions, while this is a method?

Copy link
Member Author

@hjoliver hjoliver Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the latter concerns where self.status (of a task state object) lies in the ordered list of status strings, whereas the former concerns the relative position of two bare status strings. Some minor refactoring could probably get rid of one or the other, but I don't think it particularly matters.

@hjoliver hjoliver force-pushed the task-state-reset-fix branch from af49813 to 35bdbe5 Compare March 22, 2018 01:11
@hjoliver
Copy link
Member Author

(branch rebased)

@matthewrmshin
Copy link
Contributor

@sadielbartholomew please sanity check this one.

@sadielbartholomew
Copy link
Collaborator

sadielbartholomew commented Apr 13, 2018

Sanity test in progress. Not related to the essence of this PR, but while conducting testing it emerged that I have a significant amount of 'bad' suites sitting in my 'cylc-run' directory (the Rose Bush migration PR being to blame) as per the comments in the two tests/registration .t files, so I have tried to string together some logic to bypass the issue. See branched PR.

Copy link
Collaborator

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me: sensible approach to the issue & the implementation seems sound. Please consider my referenced but essentially off-topic PR (which could be taken separately to this one if more convenient).

…r-bad-dirs

Pre-empt local 'registration' test failure by 'Errno 2'
@hjoliver
Copy link
Member Author

Merging (two approvals, and @sadielbartholomew's side-PR only affects a couple of tests).

@hjoliver hjoliver merged commit 09ca7df into cylc:master Apr 13, 2018
@hjoliver hjoliver deleted the task-state-reset-fix branch April 13, 2018 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants