Use (hopefully) fairer job processing order #617

evansd · 2023-07-13T10:07:23Z

Instead of randomising we now order jobs by how many jobs are already running in that jobs workspace, and then by age.

There are many things other than workspace we could partition on (e.g. user, project, organisation) but workspace is directly available on the job object, is easy to explain, and doesn't have the effect of penalising users who have to work on several different things.

Note that this doesn't prevent a large job request from grabbing all available slots if nothing else is running when it is submitted. But it does mean that the next slot that becomes free will go to a job from a different workspace.

Because the number of running jobs will change as we work our way through the set of active jobs we have to update the counts and re-sort the jobs on each iteration. This is not the most beautiful code, but I think it does the job.

In order for this to work correctly, we need job processing to happen in two phases. First, check up on all the running jobs, see if any have finished, and establish how much capacity we have. Then distribute this capacity fairly among the pending jobs. We achieve this by sorting first on status == RUNNING. This ensures that we handle all running jobs before we handle any pending jobs.

evansd · 2023-07-13T12:16:07Z

Update: have realised that we don't need two separate loops to deal with this; we just need to sort the jobs to make sure we handle all the running ones before the pending ones. This actually simplifies the code overall.

Original comment below.

Converting back to draft because I don't think this fixes the issue. The problem is that slots become available as we work our way through the list and finalise completed jobs. So, whenever a job completes, the slot will go to the next ready-to-run job that comes after it – which will almost certainly not be the highest priority job.

I think the only way to deal with this is to do two separate loops, one after the other: in the first we check up on all the running jobs and finalise them if necessary; and in the second we check up on all the pending jobs and start them if necessary. So slots become available during the first loop and are consumed in the second loop. It's only that second loop that needs to be in priority order.

Instead of randomising we now order jobs by how many jobs are already running in that jobs workspace, and then by age.

bloodearnest · 2023-07-14T12:45:53Z

Nice. It's kinda like hacking the two separate loops idea into the sort function.

Extra +1.

madwort · 2023-07-14T12:51:04Z

jobrunner/run.py

+        if job.state == State.RUNNING:
+            running_for_workspace[job.workspace] += 1


should this be if job.state == State.RUNNING and job_previous_state != State.RUNNING?

Good question, but I think the answer's no – and this is one of the simplifications over the previous version. We don't count running jobs at all at the start, we just build up the count as we go along. That means the count isn't accurate during the initial phase while we processing running jobs, but that's fine. By the time we get to processing the pending jobs it will be accurate.

And because we only process each job once we never risk double-counting. So it makes the logic here much simpler.

yep yep, I still had the previous version of this in mind, this is correct.

evansd force-pushed the evansd/job-scheduling branch from a075830 to c1a3420 Compare July 13, 2023 11:21

madwort approved these changes Jul 13, 2023

View reviewed changes

evansd marked this pull request as draft July 13, 2023 12:11

evansd added 2 commits July 14, 2023 12:47

fix: Use (hopefully) fairer job processing order

b8acf9e

Instead of randomising we now order jobs by how many jobs are already running in that jobs workspace, and then by age.

Remove randomisation config

eccbf0e

evansd force-pushed the evansd/job-scheduling branch from c1a3420 to eccbf0e Compare July 14, 2023 11:47

evansd marked this pull request as ready for review July 14, 2023 12:09

madwort reviewed Jul 14, 2023

View reviewed changes

madwort approved these changes Jul 14, 2023

View reviewed changes

evansd merged commit 9acc49a into main Jul 14, 2023

evansd deleted the evansd/job-scheduling branch July 14, 2023 13:10

madwort mentioned this pull request Jul 28, 2023

Create unit tests for scheduling #625

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use (hopefully) fairer job processing order #617

Use (hopefully) fairer job processing order #617

evansd commented Jul 13, 2023 •

edited

Loading

evansd commented Jul 13, 2023 •

edited

Loading

bloodearnest commented Jul 14, 2023

madwort Jul 14, 2023

evansd Jul 14, 2023

madwort Jul 14, 2023

		if job.state == State.RUNNING:
		running_for_workspace[job.workspace] += 1

Use (hopefully) fairer job processing order #617

Use (hopefully) fairer job processing order #617

Conversation

evansd commented Jul 13, 2023 • edited Loading

evansd commented Jul 13, 2023 • edited Loading

bloodearnest commented Jul 14, 2023

madwort Jul 14, 2023

Choose a reason for hiding this comment

evansd Jul 14, 2023

Choose a reason for hiding this comment

madwort Jul 14, 2023

Choose a reason for hiding this comment

evansd commented Jul 13, 2023 •

edited

Loading

evansd commented Jul 13, 2023 •

edited

Loading