Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devices: CPU pinning fix (allow explicit pinning on isolated CPUs for VM instances) #14817

Merged
merged 4 commits into from
Jan 22, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 40 additions & 31 deletions lxd/devices.go
Original file line number Diff line number Diff line change
Expand Up @@ -366,53 +366,28 @@ func fillFixedInstances(fixedInstances map[int64][]instance.Instance, inst insta
}
}

// deviceTaskBalance is used to balance the CPU load across instances running on a host.
// It first checks if CGroup support is available and returns if it isn't.
// It then retrieves the effective CPU list (the CPUs that are guaranteed to be online) and isolates any isolated CPUs.
// After that, it loads all instances running on the node and iterates through them.
//
// For each instance, it checks its CPU limits and determines whether it is pinned to specific CPUs or can use the load-balancing mechanism.
// If it is pinned, the function adds it to the fixedInstances map with the CPU numbers it is pinned to.
// If not, the instance will be included in the load-balancing calculation,
// and the number of CPUs it can use is determined by taking the minimum of its assigned CPUs and the available CPUs. Note that if
// NUMA placement is enabled (`limits.cpu.nodes` is not empty), we apply a similar load-balancing logic to the `fixedInstances` map
// with a constraint being the number of vCPUs and the CPU pool being the CPUs pinned to a set of NUMA nodes.
//
// Next, the function balance the CPU usage by iterating over all the CPUs and dividing the instances into those that
// are pinned to a specific CPU and those that are load-balanced. For the pinned instances,
// it adds them to the pinning map with the CPU number it's pinned to.
// For the load-balanced instances, it sorts the available CPUs based on their usage count and assigns them to instances
// in ascending order until the required number of CPUs have been assigned.
// Finally, the pinning map is used to set the new CPU pinning for each instance, updating it to the new balanced state.
//
// Overall, this function ensures that the CPU resources of the host are utilized effectively amongst all the instances running on it.
func deviceTaskBalance(s *state.State) {
// Don't bother running when CGroup support isn't there
if !s.OS.CGInfo.Supports(cgroup.CPUSet, nil) {
return
}

func getCPULists() (effectiveCpus string, cpus []int64, err error) {
mihalicyn marked this conversation as resolved.
Show resolved Hide resolved
// Get effective cpus list - those are all guaranteed to be online
cg, err := cgroup.NewFileReadWriter(1, true)
if err != nil {
logger.Error("Unable to load cgroup writer", logger.Ctx{"err": err})
return
return "", nil, err
}

effectiveCpus, err := cg.GetEffectiveCpuset()
effectiveCpus, err = cg.GetEffectiveCpuset()
if err != nil {
// Older kernel - use cpuset.cpus
effectiveCpus, err = cg.GetCpuset()
if err != nil {
logger.Error("Error reading host's cpuset.cpus", logger.Ctx{"err": err, "cpuset.cpus": effectiveCpus})
return
return "", nil, err
}
}

effectiveCpusInt, err := resources.ParseCpuset(effectiveCpus)
if err != nil {
logger.Error("Error parsing effective CPU set", logger.Ctx{"err": err, "cpuset.cpus": effectiveCpus})
return
return "", nil, err
}

isolatedCpusInt := resources.GetCPUIsolated()
Expand All @@ -426,9 +401,43 @@ func deviceTaskBalance(s *state.State) {
}

effectiveCpus = strings.Join(effectiveCpusSlice, ",")
cpus, err := resources.ParseCpuset(effectiveCpus)
cpus, err = resources.ParseCpuset(effectiveCpus)
if err != nil {
logger.Error("Error parsing host's cpu set", logger.Ctx{"cpuset": effectiveCpus, "err": err})
return "", nil, err
}

return effectiveCpus, cpus, nil
}

// deviceTaskBalance is used to balance the CPU load across instances running on a host.
// It first checks if CGroup support is available and returns if it isn't.
// It then retrieves the effective CPU list (the CPUs that are guaranteed to be online) and isolates any isolated CPUs.
// After that, it loads all instances running on the node and iterates through them.
//
// For each instance, it checks its CPU limits and determines whether it is pinned to specific CPUs or can use the load-balancing mechanism.
// If it is pinned, the function adds it to the fixedInstances map with the CPU numbers it is pinned to.
// If not, the instance will be included in the load-balancing calculation,
// and the number of CPUs it can use is determined by taking the minimum of its assigned CPUs and the available CPUs. Note that if
// NUMA placement is enabled (`limits.cpu.nodes` is not empty), we apply a similar load-balancing logic to the `fixedInstances` map
// with a constraint being the number of vCPUs and the CPU pool being the CPUs pinned to a set of NUMA nodes.
//
// Next, the function balance the CPU usage by iterating over all the CPUs and dividing the instances into those that
// are pinned to a specific CPU and those that are load-balanced. For the pinned instances,
// it adds them to the pinning map with the CPU number it's pinned to.
// For the load-balanced instances, it sorts the available CPUs based on their usage count and assigns them to instances
// in ascending order until the required number of CPUs have been assigned.
// Finally, the pinning map is used to set the new CPU pinning for each instance, updating it to the new balanced state.
//
// Overall, this function ensures that the CPU resources of the host are utilized effectively amongst all the instances running on it.
func deviceTaskBalance(s *state.State) {
// Don't bother running when CGroup support isn't there
if !s.OS.CGInfo.Supports(cgroup.CPUSet, nil) {
return
}

effectiveCpus, cpus, err := getCPULists()
if err != nil {
return
}

Expand Down