Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: avoid overscheduling CPUs in presence of MT jobs #5257

Merged
merged 2 commits into from
Jun 15, 2023

Conversation

davidpanderson
Copy link
Contributor

In 20ff585 we changed the sched policy so that e.g. if there are 2 4-CPU jobs on a 6-CPU host, it runs them both. I.e. overscheduling the CPUs is better than starving them.

This commit refines this a bit: if in addition to the MT jobs there are some 1-CPU jobs,
it runs one MT job and two of the 1-CPU jobs.

Also: show resource usage in cpu_sched_debug messages

Also: if CPUs are starved, trigger a work request. This logic was mistakenly hidden in an
if (log_flags.cpu_sched_debug)

Also: don't ignore log flags in the simulator

Fixes #5254

In 20ff585 we changed the sched policy so that e.g.
if there are 2 4-CPU jobs on a 6-CPU host, it runs them both.
I.e. overscheduling the CPUs is better than starving them.

This commit refines this a bit: if in addition to the MT jobs
there are some 1-CPU jobs,
it runs one MT job and two of the 1-CPU jobs.

Also: show resource usage in cpu_sched_debug messages

Also: if CPUs are starved, trigger a work request.
This logic was mistakenly hidden in an
if (log_flags.cpu_sched_debug)

Also: don't ignore log flags in the simulator
@RichardHaselgrove
Copy link
Contributor

Thanks David. I'll wait until the CI checks have completed and the artifacts have been built, then test it.

@codecov
Copy link

codecov bot commented May 29, 2023

Codecov Report

Merging #5257 (8f4aa5a) into master (3ca95a1) will decrease coverage by 0.01%.
The diff coverage is n/a.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5257      +/-   ##
============================================
- Coverage     10.86%   10.86%   -0.01%     
  Complexity     1064     1064              
============================================
  Files           279      279              
  Lines         35968    35969       +1     
  Branches       8275     8275              
============================================
  Hits           3909     3909              
- Misses        31667    31668       +1     
  Partials        392      392              

see 1 file with indirect coverage changes

@RichardHaselgrove
Copy link
Contributor

I've loaded the artifacts from this PR (f0685c4), and re-created the scenario from yesterday. I don't think we've quite hit the mark - in fact, I think we've rather overshot it on the other side. Here's a full, unedited, log cycle for cpu_sched_debug:

cpu_sched_debug_log1.txt

I'm currently running two GPU tasks specified to use one full CPU core each, and four separate CPU single core tasks. One of the single core tasks has just finished - yesterday, in this situation, the first available 3-core Amicable Numbers MT task was started, and (one minute later) two further single core tasks were suspended.

Today, the problem is at the very end:

Mon 29 May 2023 13:57:03 BST | Einstein@Home | [cpu_sched_debug] scheduling LATeah4021L08_1148.0_0_0.0_13497267_0
Mon 29 May 2023 13:57:03 BST | Einstein@Home | [cpu_sched_debug] scheduling LATeah4021L08_1148.0_0_0.0_13496490_0
Mon 29 May 2023 13:57:03 BST | NumberFields@home | [cpu_sched_debug] scheduling wu_sf3_DS-16x271-23_Grp38766of1000000_0
Mon 29 May 2023 13:57:03 BST | NumberFields@home | [cpu_sched_debug] scheduling wu_sf3_DS-16x271-23_Grp38356of1000000_0
Mon 29 May 2023 13:57:03 BST | NumberFields@home | [cpu_sched_debug] scheduling wu_sf3_DS-16x271-23_Grp39555of1000000_0
Mon 29 May 2023 13:57:03 BST | Amicable Numbers | [cpu_sched_debug] avoid MT overcommit: skipping amicable_10_21_8916_1685355602.249242_44_1
Mon 29 May 2023 13:57:03 BST | Amicable Numbers | [cpu_sched_debug] avoid MT overcommit: skipping amicable_10_21_8916_1685355602.249242_152_0
Mon 29 May 2023 13:57:03 BST | NumberFields@home | [cpu_sched_debug] scheduling wu_sf3_DS-16x271-23_Grp38357of1000000_0
Mon 29 May 2023 13:57:03 BST | NumberFields@home | [cpu_sched_debug] all CPUs used (6.00 >= 6), skipping wu_sf3_DS-16x271-23_Grp54421of1000000_0
Mon 29 May 2023 13:57:03 BST | NumberFields@home | [cpu_sched_debug] all CPUs used (6.00 >= 6), skipping wu_sf3_DS-16x271-23_Grp53192of1000000_0
Mon 29 May 2023 13:57:03 BST | NumberFields@home | Starting task wu_sf3_DS-16x271-23_Grp38357of1000000_0
Mon 29 May 2023 13:57:03 BST | NumberFields@home | [cpu_sched] Starting task wu_sf3_DS-16x271-23_Grp38357of1000000_0 using GetDecics version 400 (default) in slot 3

The overcommit protection has kicked in too soon, and no MT task can run at all - a new single core task has been started instead. In this situation, if the single core tasks finish singly, no MT task will run until forced by deadline pressure.

@RichardHaselgrove
Copy link
Contributor

But then again, nearly an hour later, this happened:

29/05/2023 14:43:01 | Einstein@Home | Computation for task LATeah4021L08_1148.0_0_0.0_12491829_0 finished
29/05/2023 14:43:01 | Einstein@Home | Starting task LATeah4021L08_1148.0_0_0.0_12497268_1
29/05/2023 14:43:01 | Einstein@Home | [cpu_sched] Starting task LATeah4021L08_1148.0_0_0.0_12497268_1 using hsgamma_FGRPB1G version 128 (FGRPopencl2Pup-nvidia) in slot 1
29/05/2023 14:43:03 | Einstein@Home | Started upload of LATeah4021L08_1148.0_0_0.0_12491829_0_0
29/05/2023 14:43:03 | Einstein@Home | Started upload of LATeah4021L08_1148.0_0_0.0_12491829_0_1
29/05/2023 14:43:04 | Einstein@Home | Finished upload of LATeah4021L08_1148.0_0_0.0_12491829_0_0 (441 bytes)
29/05/2023 14:43:04 | Einstein@Home | Finished upload of LATeah4021L08_1148.0_0_0.0_12491829_0_1 (400 bytes)
29/05/2023 14:43:11 | NumberFields@home | Computation for task wu_sf3_DS-16x271-23_Grp38357of1000000_0 finished
29/05/2023 14:43:11 | Amicable Numbers | Starting task amicable_10_21_8916_1685355602.249242_44_1
29/05/2023 14:43:11 | Amicable Numbers | [cpu_sched] Starting task amicable_10_21_8916_1685355602.249242_44_1 using amicable_10_21 version 300 (mt) in slot 3
29/05/2023 14:43:13 | NumberFields@home | Started upload of wu_sf3_DS-16x271-23_Grp38357of1000000_0_r637935920_0
29/05/2023 14:43:15 | NumberFields@home | Finished upload of wu_sf3_DS-16x271-23_Grp38357of1000000_0_r637935920_0 (247 bytes)
29/05/2023 14:44:11 | NumberFields@home | [cpu_sched] Preempting wu_sf3_DS-16x271-23_Grp53192of1000000_0 (left in memory)
29/05/2023 14:44:11 | NumberFields@home | [cpu_sched] Preempting wu_sf3_DS-16x271-23_Grp54429of1000000_0 (left in memory)

(I'd turned off cpu_sched_debug, and this log is taken from remote monitoring on a Windows machine). I'd also suspended all remaining unstarted single core tasks, so there wasn't one available to replace the one which finished - not a "normal running" situation. Notice the 1 minute preempt delay has occurred again.

run MT jobs even if they overcommit CPUs.
The problem with running 1-CPU jobs instead is that
the MT job may never run until it's in deadline pressure.
@davidpanderson
Copy link
Contributor Author

Ah. If we let 1-CPU jobs cut in front of the MT job, this may go on indefinitely.
Let's stick with the existing policy. It (temporarily) overcommits CPUs in some cases,
but it doesn't starve CPUs or MT jobs.

This PR contains other improvements so I'm keeping it.

@RichardHaselgrove
Copy link
Contributor

We also need to consider the other boundary condition - work fetch. In this test, the single-core task (numberfields) is my regular backfill project: the MT tasks (Amicable) are new for this test. They have the same resource share. Amicable has already fetched a new task: numberfields has not - that's to be expected from fetch priorities.

The machine is likely to have run dry of single-core tasks in around 3 hours - but after I've gone to bed. It will be interesting to see in the morning what it's done overnight.

@RichardHaselgrove
Copy link
Contributor

Yes, as I half expected, work fetch failed to take account of the need for single-core tasks.

30/05/2023 00:10:59 | NumberFields@home | Computation for task wu_sf3_DS-16x271-23_Grp74528of1000000_0 finished
30/05/2023 00:10:59 | Amicable Numbers | Starting task amicable_10_21_19016_1685381702.061332_987_1
30/05/2023 00:10:59 | Amicable Numbers | [cpu_sched] Starting task amicable_10_21_19016_1685381702.061332_987_1 using amicable_10_21 version 300 (mt) in slot 2
30/05/2023 00:19:15 | NumberFields@home | Sending scheduler request: To report completed tasks.
30/05/2023 00:19:15 | NumberFields@home | Reporting 2 completed tasks
30/05/2023 00:19:15 | NumberFields@home | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: )

That new Amicable task was the second to run concurrently, so we're back to 8-core overcommittment.

But manually forcing a work fetch brought the core count back to 6:

30/05/2023 05:54:49 | Amicable Numbers | work fetch suspended by user
30/05/2023 05:54:56 | Amicable Numbers | task amicable_10_21_29370_1685407802.177515_549_1 suspended by user
30/05/2023 05:54:58 | NumberFields@home | Sending scheduler request: To fetch work.
30/05/2023 05:54:58 | NumberFields@home | [sched_op] CPU work request: 15832.25 seconds; 0.00 devices
30/05/2023 05:55:00 | NumberFields@home | Scheduler request completed: got 6 new tasks
30/05/2023 05:55:04 | Amicable Numbers | [cpu_sched] Preempting amicable_10_21_24483_1685396702.041673_965_0 (left in memory)
30/05/2023 05:55:04 | NumberFields@home | Starting task wu_sf3_DS-16x271-23_Grp180623of1000000_0

(the task I suspended was cached, unstarted)

@davidpanderson davidpanderson merged commit a0d0a9d into master Jun 15, 2023
@AenBleidd AenBleidd deleted the dpa_mt_sched branch August 15, 2023 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Single core task will not run alongside a multicore task
3 participants