Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_layered: More optimal core allocation #1109

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kkdwivedi
Copy link
Contributor

This PR documents my initial attempt at doing more optimal layer core order generation, and invites others to provide ideas.

I am not continuing fixing the current attempt (there's a few odd order generations with more LLCs), but the idea is to space out layers as much as possible to minimize overlaps. For this we use a greedy approach to finding segments with maximum run of unallocated LLCs and try to place a new layer there. Then grow it in either direction before running out of cores.

For a machine with two LLCs, 0-19, and 20-39.

Old:

layer: kkd algo: Sticky core order: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
layer: kkd 2 algo: Sticky core order: [28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
layer: kkd 3 algo: Sticky core order: [32, 33, 34, 35, 36, 37, 38, 39, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]

New:

layer: kkd algo: StickyTopo core order: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
layer: kkd 2 algo: StickyTopo core order: [39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
layer: kkd 3 algo: StickyTopo core order: [19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]

We can be more intelligent here, like growing out in our own NUMA domain's LLCs first before spreading outside that, but all of this is futile because this algorithm won't be globally optimal. It assumes all layers have equal sizes and load, which is not true. Using a "weight" to push layers left or right when allocating LLCs works, but cannot adapt to changing layer sizes at runtime, which is the prevalent case. Thus, there will be overrun eventually even with this algorithm.

Instead, after talking to Tejun, I will update this to estimate the layer size at runtime, and regenerate the core order by picking free LLCs for each layer starting with those with the greatest size (thus precedence), and attempt to pack the rest if necessary.

@likewhatevs
Copy link
Contributor

invites others to provide ideas.

this is cool. it might be cool if this were p-core e-core aware, but idk how that could work well without being aware of if the user preferred energy efficiency or performance (maybe via a flag, idk)?

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
@kkdwivedi
Copy link
Contributor Author

kkdwivedi commented Jan 14, 2025

Here's an updated approach, some implementation details may keep changing until I've tested this with a real-workload properly. I will remove the draft status once I've tested with some real workload and verified different for edge cases properly and then it should be good to go.

Approach

The idea is to assign entire LLCs to layers at once, therefore allocation happens at LLC granularity.

There is an idea of heavy "sticky" layers and light "low" layers, which is based on utilization. Sticky layers can forcibly reclaim/reassign LLCs from low layers. LLCs used for sticky layers are also not visible in the "free" LLC pool for allocation.

For now I was hacking into the code to set sticky by matching on the name, but it should either be done automatically (some threshold layer size) or through the config by indicating main workload vs misc stuff through it.

Compaction

Compaction is driven by layer utilization. Low/light layers are merged into the same or fewer LLCs based on utilization target (harcoded to 20% but will change to something configurable).

Hysteresis

If we see that a layer goes above or below continually for 2 (arbitrarily chosen for now) intervals of step function, only then do we grow or shrink a layer. This avoids flipping the algorithm of reallocating cores on sudden spikes or boundary conditions. The utilization range is hardcoded for testing for now but it can be changed to something configurable.

TODOs:

  • Still debugging an issue where LLC picking logic is behaving incorrectly with multiple loaded layers. WIP.
  • More testing; so far very lightly tested with stress-ng on my own server with 8 LLCs.
  • Refactor code as a new StickyDynamic algorithm if approach looks sane. Currently, I directly modified the step() function for testing.
  • Split into smaller commits and commit logs.
  • Remove hardcoded utilization constants, and set them from config and at runtime.

@kkdwivedi kkdwivedi changed the title scx_layered: More optimal layer core order generation scx_layered: More optimal core allocation Jan 14, 2025
for (i, layer) in self.layers.iter().enumerate() {
let owned = self.sched_stats.layer_utils[i][LAYER_USAGE_OWNED];
let open = self.sched_stats.layer_utils[i][LAYER_USAGE_OPEN];
let total_util = owned + open;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When determining target number of CPUs, open consumptions are considered iff the layer has no CPUs because otherwise grouped layers end up overallocating. Also, I wonder whether it'd make more sense to determine the number of LLCs to allocate in terms of the the result of target CPU calculations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the allocation needs to be fair within each LLC too for intel and other CPUs with one or few LLCs.

wants
}

fn weighted_target_llcs(&self, raw_wants: &[usize]) -> Vec<usize> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, wouldn't it make more sense to determine this according to the number of CPUs allocated to each layer?

let assigned_count = layer.llcs_assigned.len();
if layer.is_sticky && assigned_count > 0 {
// remove from free_llcs
for &llc_id in layer.llcs_assigned.keys() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These operations may be easier with HashSet or BTreeSet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants