Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert a task that has met its suicide prerequisites #3282

Closed
matthewrmshin opened this issue Aug 9, 2019 · 15 comments
Closed

Insert a task that has met its suicide prerequisites #3282

matthewrmshin opened this issue Aug 9, 2019 · 15 comments
Labels
Milestone

Comments

@matthewrmshin
Copy link
Contributor

Describe exactly what you would like to see in an upcoming release

On insertion of task that has met its suicide prerequisites, it currently does:

  • Stay inserted if the upstream task proxies are no longer in the task pool.
  • Suicide immediately if the upstream task proxies are still in the task pool.

Ideally, the inserted task should remain regardless.

Additional context

I would have called this a bug if I can claim to have no knowledge of the internal working of the task pool, but I have lost the right to that claim now. 👓
Most likely solved by spawn-on-demand? See also #2143, #2643, #3226.

Pull requests welcome!

Not as easy as one would hope, or I would have done it by now. E.g. what should be the behaviour of the tasks that are then spawned by the inserted task?

@matthewrmshin matthewrmshin added this to the later milestone Aug 9, 2019
@hjoliver
Copy link
Member

hjoliver commented Aug 12, 2019

Ideally, the inserted task should remain regardless.

I think I disagree with that. Or at least it is arguable.

  • Stay inserted if the upstream task proxies are no longer in the task pool.
  • Suicide immediately if the upstream task proxies are still in the task pool.

Both of these should be expected, it seems to me. If upstream dependencies are no longer in the task pool, the inserted task's prerequisites can't be satisfied (short of manual manipulation or re-insertion of upstream tasks); and if they are still in the pool (and they satisfy the suicide prerequisite) why shouldn't the inserted task suicide immediately?

@dwsutherland
Copy link
Member

dwsutherland commented Aug 12, 2019

Will be solved by spawn-on-demand, where there is no task pool.
The whole concept of "task insertion" and "suicide" would be different (i.e. !bar would mean "don't run bar" instead of "remove from pool"), and viewing a suicided task is probably more about showing a branch of ghosts (which is why I like the idea of #3226 )..

And if it's available in the data-store, then you only need to trigger the task.. Insertion then trigger would probably be combined I suppose (not sure what insertion represents without a pool otherwise)

@hjoliver
Copy link
Member

I wouldn't say "solved by spawn-on-demand" so much as "won't be an issue" then.

But under the current regime, I don't agree that "Ideally, the inserted task should remain regardless." - because then you've got a task that's not removing itself even though its suicide prerequisites are satisfied.

@hjoliver
Copy link
Member

(But maybe @matthewrmshin will convince me 👍 )

@dwsutherland
Copy link
Member

dwsutherland commented Aug 12, 2019

And if it's available in the data-store, then you only need to trigger the task.. Insertion then trigger would probably be combined I suppose (not sure what insertion represents without a pool otherwise)

I guess "insertion" would be more like queuing handler(s) for the inserted task(s)?
(prerequisite information retrieved from a store instead of a pool)

I wouldn't say "solved by spawn-on-demand" so much as "won't be an issue" then.

Fair point

@hjoliver
Copy link
Member

I guess "insertion" would be more like queuing handler(s) for the inserted task(s)?

There might still be something like "insertion" under the hood (if there are still objects that represent individual active tasks) but it won't be exposed to users like now.

@matthewrmshin
Copy link
Contributor Author

This was an operational problem on our site last Thursday. Long story short, they discovered something wrong with the results of the succeeded branch, and so were trying to insert/re-run the family on the recovery branch. They were unable to do so due to this issue, and had to stop the suite and warm start the whole cycle.

The main purpose of insert (as noted in #2643) is to re-run some tasks. For the tasks to suicide immediately (even when the suite is held) is unhelpful.

@hjoliver
Copy link
Member

hjoliver commented Aug 12, 2019

The main purpose of insert (as noted in #2643) is to re-run some tasks. For the tasks to suicide immediately (even when the suite is held) is unhelpful.

Yes, but: what happens to an inserted task always depends on the state of its prerequisites. I don't see how this is fundamentally any different from an inserted task running immediately (because its normal prerequisites are satisfied) and the operator saying s/he doesn't want it to do that. We can't have tasks in the suite that are not doing what they are supposed to do when their prerequisites are satisfied. (Well, we could, but that would open a whole other can of worms). (We could delay suicide when the suite is held, but the suicide would still be actioned immediately once the suite is released).

The right thing to do is to manipulate the state of upstream tasks appropriately before insertion - i.e. make the conditions right for the recovery branch to run. E.g. for this graph:

model => post
model:fail => recover
model | recover => post
model => ! recover

Your scenario is (I think): model succeeds but the operator decides we actually need to go back and run the recovery branch (which really implies that model incorrectly reported success). But model still exists in the task pool in the succeeded state, so recover immediately suicides on insertion (as the graph says it should do, if model succeeded).

The right way to handle this is:

  1. reset model to failed (or alternatively remove it from the suite)
    • (which actually reflects the reality of the situation)
  2. insert recover - now it will run instead of suiciding
  3. (and either re-insert post or reset it to waiting, so it runs again after recover)

@dwsutherland
Copy link
Member

The right way to handle this is:

  1. reset model to failed (or alternatively remove it from the suite)

    • (which actually reflects the reality of the situation)
  2. insert recover - now it will run instead of suiciding

  3. (and either re-insert post or reset it to waiting, so it runs again after recover)

That's usually how I do it...

@matthewrmshin
Copy link
Contributor Author

That was what I recommended them to do, until I realised that they had a family A to family B graph.

@hjoliver
Copy link
Member

That was what I recommended them to do, until I realised that they had a family A to family B graph.

So was resetting the upstream family state not sufficient? (I might have to see an example to understand the problem)... if it isn't sufficient, what would you suggest is a good solution? Perhaps we need the option to insert tasks with additional prerequisites (normal and suicide) that can't be satisfied automatically by the suite, so that they wait for manual triggering no matter what (and regardless of upstream task status)??

@matthewrmshin
Copy link
Contributor Author

The instruction to reset upstream is OK if there is a clear and obvious path. However, it may not be obvious under circumstances, e.g. if upstream consists of or/any logic.

And the user really did not expect the inserted task to suicide while the suite was held.

@hjoliver
Copy link
Member

hjoliver commented Aug 14, 2019

And the user really did not expect the inserted task to suicide while the suite was held.

Yeah we should certainly change that. And also check that the held task doesn't just suicide immediately when released even if triggered while held.

However, it may not be obvious under circumstances, e.g. if upstream consists of or/any logic.

Not sure what we should do in that case.

@matthewrmshin matthewrmshin modified the milestones: later, cylc-9 Aug 28, 2019
@dpmatthews
Copy link
Contributor

@hjoliver I think this issue is fixed now we have re-flow?

@hjoliver
Copy link
Member

Yes, insertion is no longer a thing.

@hjoliver hjoliver modified the milestones: cylc-9, cylc-8.0.0 Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants