Correct stall diagnosis #3892

hjoliver · 2020-10-27T02:18:13Z

Better workflow completion handling (SoD Proposal)

Long story short:

Pre-SoD stall was pragmatically rather than conceptually grounded: the scheduler had literally got stuck and didn't know what to do about it. There were no more tasks to run and one or more failed or unsatisfied waiting tasks in the pool.

The unsatisfied waiting tasks part could lead to normal workflow completion being incorrectly identified as a stall, because of all the wholly-unsatisfied waiting tasks spawned ahead even though they might not be needed.

Post-SoD there are no wholly-unsatisfied waiting tasks and there will soon (#3822) be no partially-satisfied ones either
(just partially-satisfied prerequisites in a hidden pool, and as the example in the doc section linked to above shows they can't be used to reliably identify a stall).

What stall should mean: the scheduler can't do anything more, but it knows that the flow is not finished.

The only way valid to make that determination now is if there are unhandled failed tasks in the pool. They are, by definition, task outcomes that were not meant to happen.

So:

if the active pool is empty:
- completed
else if the active pool contains only unhandled failed tasks:
- stalled
else:
- still running

At normal shutdown or stall log any partially satisfied prerequisites in case they point to a flow design error, but in general we can't assume they were "meant" to be completed.

(Note special treatment of unhandled failed tasks is still under discussion; if that special treatment is revoked there will be no stall concept at all anymore).

hjoliver added the sod-follow-up label Oct 27, 2020

hjoliver self-assigned this Oct 27, 2020

hjoliver mentioned this issue Oct 27, 2020

non-urgent SoD follow-up issues #3753

Open

5 tasks

hjoliver changed the title ~~Workflow stall vs completion~~ Correct stall diagnosis Oct 27, 2020

hjoliver added this to the cylc-8.0.0 milestone Oct 27, 2020

This was referenced Oct 27, 2020

Hide waiting tasks from n=0. #3823

Merged

document meaning of scheduler stall cylc/cylc-doc#169

Open

hjoliver closed this as completed in #3823 Nov 2, 2020

oliver-sanders modified the milestones: cylc-8.0.0, cylc-8.0a3 Nov 10, 2020

hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct stall diagnosis #3892

Correct stall diagnosis #3892

hjoliver commented Oct 27, 2020 •

edited

Loading

Correct stall diagnosis #3892

Correct stall diagnosis #3892

Comments

hjoliver commented Oct 27, 2020 • edited Loading

hjoliver commented Oct 27, 2020 •

edited

Loading