Proposed new task polling logic. #1792

hjoliver · 2016-04-14T02:40:16Z

This arises from the need to allowing polling of failed tasks as discussed in #1762, partly in order to make tests that detect failure of a remote poll operation (given than batch schedulers typically list tasks for some minutes after they've exited, during which time they will poll as running if the batch queue has to be interrogated).

Allow all tasks with a Job ID (i.e. 'submitted' or later) to be polled
- but only active ones ('submitted' or 'running') by default in the poll-all case, to avoid unnecessary mass polling of succeeded tasks.
Allow all tasks to be resurrect-able Poll tasks when "allow resurrection" is True. #1514
- i.e. any 'failed' task can be returned to 'submitted' or 'running' as a result of polling.
- ditch the current "enable resurrection" config item.
Always believe a poll result ~~if it takes the task state forward~~
- e.g. 'running' => 'succeeded' or 'failed'.
If a poll result would take the state backwards, e.g. 'succeeded' => 'running' it could mean the poll result was late (task was sending "succeeded" while it was being polled as running), in which case ignore the poll and immediately issue a second poll.
- Always believe a second poll.

[UPDATE] the last (more difficult) bit is not needed, because the job status file (reliably) records job success or failure - we only interrogate the batch queue if the this information is not in the status file yet [I think that misses the point - removing the strike-through over the last bullet point]

[UPDATE 2] - if batch scheduler preempts by kill and re-queue, we might want a poll to take the job state backwards ('failed' => 'submitted') - but would need to interrogate the batch scheduler rather than the job status file.

hjoliver · 2016-04-14T04:14:34Z

(TBD - polling of 'retrying' tasks, or not)

matthewrmshin · 2016-04-14T05:52:21Z

Good idea. Note that job poll results should contain time information from the job status file, so we should know what to trust or not.

matthewrmshin · 2016-04-14T08:08:51Z

I have just realised that my argument above will fall apart if multiple entries are written to the job status file, e.g. pipe issue #1783, pre-emption/resurrection #1514, etc. In normal circumstances, however, the time information from the job status file should be trustworthy.

hjoliver · 2016-04-15T01:50:49Z

Regarding polling and pre-emption (/resurrection) see #1514

hjoliver · 2016-04-19T03:14:00Z

I'll return to this once #1762 and #1775 are merged. [update: these are DONE]

hjoliver · 2016-06-24T14:20:26Z

@matthewrmshin says:

poll results are now recorded in the job status file - if a status line exists, it can be trusted (if records that the job was polled as still queued or running, we interrogate the batch queue again, of course).
signalled kills may be untrustworthy - need to poll to confirm?

hjoliver self-assigned this Apr 14, 2016

hjoliver added this to the soon milestone Apr 14, 2016

hjoliver mentioned this issue Apr 14, 2016

Fix polling of jobs submitted to Loadleveler. #1762

Merged

matthewrmshin mentioned this issue Jun 22, 2017

Poll tasks when "allow resurrection" is True. #1514

Closed

hjoliver mentioned this issue Aug 12, 2017

1792 - task polling logic and state reset. #2396

Merged

matthewrmshin modified the milestones: next release, soon Sep 11, 2017

oliver-sanders closed this as completed in #2396 Sep 13, 2017

dpmatthews mentioned this issue Nov 15, 2021

Polling can incorrectly return a failed task to the running state #4513

Open

oliver-sanders mentioned this issue Nov 16, 2021

job polling broken for failed jobs after restart #4516

Closed

hjoliver mentioned this issue Feb 10, 2022

Only poll non-waiting tasks #4658

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed new task polling logic. #1792

Proposed new task polling logic. #1792

hjoliver commented Apr 14, 2016 •

edited

Loading

hjoliver commented Apr 14, 2016

matthewrmshin commented Apr 14, 2016

matthewrmshin commented Apr 14, 2016 •

edited

Loading

hjoliver commented Apr 15, 2016

hjoliver commented Apr 19, 2016 •

edited

Loading

hjoliver commented Jun 24, 2016 •

edited

Loading

Proposed new task polling logic. #1792

Proposed new task polling logic. #1792

Comments

hjoliver commented Apr 14, 2016 • edited Loading

hjoliver commented Apr 14, 2016

matthewrmshin commented Apr 14, 2016

matthewrmshin commented Apr 14, 2016 • edited Loading

hjoliver commented Apr 15, 2016

hjoliver commented Apr 19, 2016 • edited Loading

hjoliver commented Jun 24, 2016 • edited Loading

hjoliver commented Apr 14, 2016 •

edited

Loading

matthewrmshin commented Apr 14, 2016 •

edited

Loading

hjoliver commented Apr 19, 2016 •

edited

Loading

hjoliver commented Jun 24, 2016 •

edited

Loading