-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguish never-spawned from never-submitted. #6067
Conversation
|
suicide trigger[scheduling]
[[graph]]
R1 = """
a? & b => c
a:failed? => !a & !c
"""
[runtime]
[[a]]
script = "sleep 3; false"
[[b,c]] now works as expected: ...
INFO - [1/c:waiting(runahead)] => waiting
INFO - [1/a/01:failed] removed from active task pool: suicide trigger
INFO - [1/c:waiting] removed from active task pool: suicide trigger
INFO - [1/b/01:running] => succeeded
INFO - Not spawning 1/c: already used in this flow
INFO - [1/b/01:succeeded] removed from active task pool: completed
INFO - Workflow shutting down - AUTOMATIC
INFO - DONE cylc removeSince #5643 is not done yet, we can't remove future tasks, so need an example that spawns a waiting task to be removed: [scheduling]
[[graph]]
# b spawns c before a removes c
R1 = "a & b => c"
[runtime]
[[a]]
script = "sleep 5; cylc remove $CYLC_WORKFLOW_ID//1/c"
[[b,c]] Result: ...
INFO - [1/c:waiting(runahead)] => waiting
INFO - [1/b/01:succeeded] removed from active task pool: completed
INFO - Command "remove_tasks" received. ID=f2834293-11dc-49cd-ae5c-e1d9f5cbdb42
remove_tasks(tasks=['1/c'])
INFO - [1/c:waiting] removed from active task pool: request
INFO - Command "remove_tasks" actioned. ID=f2834293-11dc-49cd-ae5c-e1d9f5cbdb42
INFO - [1/a/01:running] => succeeded
INFO - Not spawning 1/c: already used in this flow
INFO - [1/a/01:succeeded] removed from active task pool: completed
INFO - Workflow shutting down - AUTOMATIC
INFO - DONE I think this is what we want, subject to possible revision when #5643 is implemented. |
Can rebase this onto 8.2.x if necessary (it seems we may get another 8.2 release out before 8.3.0?) |
Yes, we've got a cylc-rose issue that we would like to fix in the 8.2 series so are planning another release. |
This actually conflicts quite badly with 8.2.x. Mind if we leave it on master 8.3.0? |
Shockingly, we've only had one report of this so far (failure cases are presumably less well tested in workflows?) so we can wait a little. |
* Ensure removed tasks are not automatically re-spawned by task pool logic. * Ensure removed tasks can be manually re-spawned by request.
Looks good to me, I wrote a couple of extra tests whilst reviewing - hjoliver#49 Can confirm that this fixes the bug where removed tasks could be automatically respawned at a later time by task pool logic. There's no associated issue for this, but might be worth a mention in the changelog entry. |
Thanks, I'll take a look tomorrow. I presume you saw my new integration tests on this branch (which seem to achieve 100% patch coverage) - do yours go beyond mine? |
Yes, I extended your test to ensure that it's possible to get the task back by triggering and added a new one to ensure that removed tasks remain removed. These behaviours follow naturally from the implementation, but it's worth locking them down in the tests to protect against future changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR but while I've noticed it: the "added to active task pool" info messages introduced in #5658 seem a bit excessive. Consider demoting to debug?
cylc/flow/task_pool.py
Outdated
submit_num == 0 | ||
): | ||
# Previous instance removed before completing any outputs. | ||
LOG.info(f"Not spawning {point}/{name}: already used in this flow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this message is easy to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but how to describe what's happening concisely?
Basically, the flow (on this branch of the graph) came to a halt at this point because the task was removed earlier by suicide trigger or cylc remove
. That definitely warrants some kind of log message.
I'll change it to this (subject to further litigation, of course):
LOG.info(f"Not spawning {point}/{name}: already used in this flow") | |
LOG.info(f"Flow stopping at {point}/{name} - task previously removed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, "flow blocked at" might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like "flow blocked" I'm afraid! How about:
Skipping {point}/{name} {stringify_flow_nums(flow_nums)} - task previously removed
Or just
Not spawning {point}/{name} {stringify_flow_nums(flow_nums)} - task previously removed
as it was the "already used in this flow" that I think doesn't make much sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "flow blocked" is better than your two suggestions - it correctly indicates that the flow will not continue here because the task was removed.
"Skipping" suggests the flow will skip over the removed task, which is not the case.
"Spawning" (as you'll have seen in the chat) is I think being expunged from user terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think options are as follows (or some close variant thereof):
- flow blocked here, the task was removed earlier
- (note addition of "here" to avoid implicating the entire flow)
- not spawning task, it was removed earlier
Personally I think in this instance "flow blocked" is more user-friendly, in that it is more or less self-explanatory, compared to "not spawning".
But given that @oliver-sanders agrees "not spawning" is correct, and (as I've just shown) spawning probably has to remain a user-facing term, perhaps we could just stick with that.
It needs to be logged at INFO level though, to explain why the flow does not continue here, at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, the term "spawn" ... was never a user facing term in the first place.
Ah, really?!
That's Cylc 7!
I'm very confused by these comments. It was your own argument that users should not need to understand the management (insertion or removal) of task proxies or task pool logic in general which I got on board with. It was a primary goal of SoD to remove the need for us to communicate this internal implementation detail, and a necessary side effect as SoD would make it harder for users to monitor the task pool itself. That's why spawning was retired right, s/spawn/set-outputs
, users shouldn't need to understand spawning, because SoD will ensure the task proxy is always there when needed right? E.G. #2143. I agreed with you on this at the time, it was a good idea!
To quote your own proposal:
Users should not have to know about the "task pool" and the concepts of task "insert" and "remove" anymore.
I appreciate this is going back a few years now, maybe reality crept in and the thinking has changed. Perhaps it was a bit ambitions to presume we could hide the internal logic. Maybe we should write up the SoD model and communicate it to users as essential reading? We could do with writing this up in any case.
It needs to be logged at INFO level though, to explain why the flow does not continue here, at this time.
I'm not convinced it does. We are trying to communicate to users the graph model, not the internal SoD implementation (or at least I thought we were).
Take this graph for example:
a => b => c => d
a => x
b => x
c => x
d => x
If the user runs cylc remove //x
, they will get a message along the lines of "x removed".
They don't need to see four "not re-spawning x" messages in a row as the graph proceeds, there is no new information in these messages. Moreover, it isn't communicating anything to them about the graph. Of course x isn't be re-spawned, I removed it! On purpose!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My approval still stands of course, I'll leave you two to find something you can agree on.
I was trying to overt a terminology argument not to stoke one, I accepted the original phrasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think "flow blocked" sounds too negative and some users will see it and think something has gone wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK @MetRonnie, let's try to agree. I don't see "flow blocked" as negative, but perhaps we could add more information to explain the reason for it.
We've established that "skipping" isn't right. Are you happy with either of these, or do you have another suggestion?:
- "flow blocked by previous removal of [task]"
- "not spawning [task] (previously removed)"
I like "flow blocked" because it automatically conveys the fact that the flow will not continue from this point. But having argued that "spawn" is still needed to explain how flows evolve, I don't mind if we use that here too.
remove: add integration tests
Co-authored-by: Ronnie Dutta <61982285+MetRonnie@users.noreply.github.com>
All comments addressed. Logging is nice and succinct now:
|
0f264f5
to
995734b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to see "flow blocked" changed, but happy enough
Sorry, I guess I took "never" a bit too literally!
Ah, OK @oliver-sanders, I do understand your point now. That statement was idealistic in retrospect, and maybe that's my fault. Early SoD discussions were very scheduler-focused and we didn't know exactly what would be in the task pool (e.g. whether partially satisfied tasks should be "hidden" or not). Now the GUI matters too because the n=0 window IS the task pool, so we have to explain what tasks are in it and how they get there. After going through all the options that we've thought of (to my knowledge) it still seems to me that the best terminology for this is "task pool" and "tasks get spawned (into the pool) by completion of upstream outputs". [I have actually commented on this several times over the past couple years, but not surprising if it got lost in the noise.]
Yeah, more or less what I meant by "idealistic". The basic "SoD for Users" story for users, which can serve as the basis for documenting flows, optional outputs, and interventions, is really quite simple though:
|
Yes, but you've given a simple example where the removal and its effects are contiguous and obvious. In more complex workflows, and especially once we can remove future (non-pool) tasks as planned, that may not be the case. This is not exposing SoD implementation, it just explains why the flow does not continue at this point the graph: because at some point in the past someone or something "removed" the task there. Whether this is INFO or DEBUG is a pretty small thing, not worth dying in a ditch over, but I can't see how hurts to explain to users why the flow halts now on this branch when the "remove" operation may have happened some time ago. |
Just noticed you approved this @MetRonnie - I could have saved myself some writing above 🤦♂️ OK, I'll merge this after tweaking the log message to give a bit more information as described above, so we can push on to 8.3.0.
BTW I (unsuccessfully!) made the exact same argument for manual expiry, which has the exact same effect on the workflow: x doesn't spawn, which blocks the flow at that point in the graph, which may result in premature shutdown. I'll concede and demote the message to DEBUG though, because removal is pretty niche, and if the user did it they don't have too much of a right to complain when the inevitable downstream effects occur. It's definitely a candidate for our proposed intermediate log level though #3647 (comment) |
Close #6066
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.