Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPU hotplug support #1119

Closed
ddingel opened this issue Oct 18, 2016 · 10 comments
Closed

Add CPU hotplug support #1119

ddingel opened this issue Oct 18, 2016 · 10 comments

Comments

@ddingel
Copy link

ddingel commented Oct 18, 2016

Description

Allow new and already started containers to use hotplugged CPUs. For details please take a look at this Docker issue.
In the discussion of the mentioned issue it came up that this should be targeted at runc. Also that this rejected PR might help us.

Steps to reproduce the issue:

Take a look at the original Docker issue.

Describe the results you expected:

Already running and newly started containers should be able to use additional online CPUs.

Initial proposal

Containers which are not started with a --cpuset-cpus or --cpuset-mems option are called unrestricted in the following, the others are called restricted.

For each unrestricted container we skip the current creation of a cpuset cgroup sub-directory under the Docker cgroup directory. Then all tasks within that container will be automatically associated to the root cpuset cgroup. Because the root cpuset cgroup is managed by the kernel its cpuset.cpus will always contain all online CPUs. Therefore unrestricted containers will use all online CPUs.

Additional comments

After taking a look at the rejected PR we came to the conclusion that our proposed solutions has two significant advantages:

  • Already started unrestricted container will also be able to use hotplugged CPUs
  • As our proposal will not change the docker cgroup, we will not interfere with external changes

It might be worth to note that runc update --cpuset-cpus | --cpuset-mems will need some more logic to be able to create on-the-fly new cgroup hierarchies.

@cyphar
Copy link
Member

cyphar commented Oct 18, 2016

I personally don't really like not joining cgroups, because it means that doing a runc update after-the-fact is a bit more painful than it should be (which you mentioned). Now, I understand your concerns but can you explain why the conclusion should be different to the previously rejected PR? From my perspective, this is still a kernel issue (the cpuset is not being updated properly by the kernel). What's more, it was mentioned that this was solved in the unified hierarchy. Now, runc currently doesn't support the unified hierarchy, but that would be a nice thing to add to #654.

My main concern with this is that it doesn't really solve the problem, it more works around it. Now, the workaround isn't too bad, but I do wonder if just fixing #654 wouldn't be a better solution overall (as it would also mean that we get unified hierarchy support for free).

@ddingel
Copy link
Author

ddingel commented Oct 19, 2016

@cyphar thank you for your feedback and @michael-holzheu for internal discussion about it. There are two points which might make our change more favorable:

  • There are currently no commercial linux distributions supporting cgroup_v2. So if this gets fixed in Docker by using cgroup_v2 customers still need to wait for a kernel upgrade from their distributors.
  • I played with the current upstream kernel, from what I understand there is currently no cpuset support, right? If that is not correct do you have any pointers on documentation/talks how hotplugging will be fixed, I grepped through the kernel sources and was left with a lot of confusion ;).

@ddingel
Copy link
Author

ddingel commented Oct 24, 2016

Ping @lizf-os @mrunalp @hqhq @HuKeping . As you worked on the first PR it would be good to know what you think about our idea and if it is worth continuing. Also it would be good to know how unified hierarchy will fix the missing cpus in the subsequent cpu sets for cpu hotplug.

@hqhq
Copy link
Contributor

hqhq commented Oct 30, 2016

@ddingel Sorry for the late. Honestly I don't quite like your proposal, it's too aggressive and might cause backward compatibility issues, and with your proposal, updating containers will cause cgroup migration, which means cpu/memory migration which all make things complicated.

For this issue, I'd rather another possible way that we add a new flag which can force update all parent configurations, but I don't know how other maintainers think.

@ddingel
Copy link
Author

ddingel commented Nov 7, 2016

@hqhq Thank you for your feedback. The proposal was not meant to be aggressive in any way, sorry for that.

So @michael-holzheu and I took the time to understand how your proposal would work in respect of this issue and I would like to know if we have the same understanding and also what the other maintainers think about it.

The docker daemon would be started with an optional --cpu-autoscale flag. When that flag is given, the daemon would either listen with udev or netlink for CPU hotplug events:

  • In the CPU remove case the docker daemon does not need to take any actions.
  • In the CPU add case the docker daemon would trigger contained -> runc to refresh cgroup/cpuset/docker/cpuset.cpus

This would allow newly started containers to scale to all CPUs.

Additionally the docker daemon could also enable this cpu via containerd -> runc for every container which:

  • does not specify a cpuset
  • where this cpu is already included in that cpuset

This second step might be a scaling problem if the number of containers or the number of hotplug events is big.

On the other hand we would get:

  • every running unrestricted container scales to all CPUs
  • every restricted container scales to all currently available CPUs specified, even if they were dis- and reenabled

Maybe this could be an opt-out like --no-cpu-scale instead of an opt-out option?

@hqhq
Copy link
Contributor

hqhq commented Nov 9, 2016

@ddingel This looks like a better approach, but there is still a major concern as I had in #254 , say you have cpu 0-3 on your host, when cpu 1 got removed, cpuset.cpus of your containers would be changed to 0,2-3 by kernel, and then cpu 1 was added back, before you tried to refresh cpuset, how could you know containers with cpuset.cpus set as 0,2-3 were changed because of cpu offline, or because there're started with cpuset-cpus="0,2-3"? So if cpuset.cpus was modified by kernel, it would be only reasonable to be refreshed by kernel too.

Besides that, all these monitor and refresh work should be done by Docker, I don't know how they would like it.

For runc, I think the only compromise we can do is mostly like what #254 tried to do, but with an option like --force-cpuset, so people tried to start a container with cpuset-cpus=0-2 on a host with cpuset.cpus=0-3 but the container's parent is restricted with cpuset-cpus=0-1, we can force update parent's cpuset and start the container successfully. Though apparently we can't say that can be called CPU hotplug support.

@ddingel
Copy link
Author

ddingel commented Nov 9, 2016

@hqhq I think I am missing your point, here is how I think this would be played:

say you have cpu 0-3 on your host, when cpu 1 got removed,
cpuset.cpus of your containers would be changed to 0,2-3 by kernel,
and then cpu 1 was added back, before you tried to refresh cpuset

In my understanding there is no need to refresh or forward the change from online to offline.

The change from offline to online would trigger the Docker daemon to refresh. So we have following states:

  • Docker daemon knows that the container has 0-3 CPUs in the cpuset or is unlimited
  • sys/fs/cgroup has 0,2-3

how could you know containers with cpuset.cpus set as 0,2-3 were changed because of cpu offline, or because there're started with cpuset-cpus="0,2-3"?

State within the Docker daemon (process as well as on disk like hostconfig.json). The Docker daemon already tracks the user specified cpuset-cpus. So if a CPU comes online for a container which does not specify a cpuset or where this CPU is included in the cpuset the docker daemon can trigger a refresh.

So for runc I would think it is more like a runc update --force --cpuset-cpus $DockerState.

In case that --forceoption is given, runc will refresh all necessary cgroups. As Docker will only do this if started with --cpu-autoscale this should be non-breaking.

So if cpuset.cpus was modified by kernel, it would be only reasonable to be refreshed by kernel too.

But from what I understand it will not be fixed for v1. For v2 I am still searching how and if that will be fixed, @lizf-os maybe you could point me to some documentation/code? So that I could see if we the proposed solution for v1 is applicable for v2?

For runc, I think the only compromise we can do is mostly like what #254 tried to do, but with an option like --force-cpuset, so people tried to start a container with cpuset-cpus=0-2 on a host with cpuset.cpus=0-3 but the container's parent is restricted with cpuset-cpus=0-1, we can force update parent's cpuset and start the container successfully. Though apparently we can't say that can be called CPU hotplug support.

Fair enough I am fine with --force-cpusetor --force and that should be enough to allow CPU hotplug support being build in higher levels. @justincormack @estesp does this sounds good for you?

@hqhq
Copy link
Contributor

hqhq commented Nov 10, 2016

@ddingel Fair enough, that looks practicable, thanks for explaining. I'm fine we add force update for cpuset in runc no matter Docker will take your idea or not, feel free to shot a PR :) .

@alicefr
Copy link

alicefr commented Apr 18, 2017

Hi!
Unfortunately, Dominik left and now I am in charge to develop the cpuhotplug feature. I took a look to Dominik's proposal and I noticed a drawback using the method to add a flag to runc update command.

Previously, we did not consider the case when a CPU event occurs and no container is present. In this case, runc update cannot be used and we may have lost some CPU events. My proposal is to develop a plugin for docker that can be enabled by adding the flag --authorization-plugin. It creates a process that listens to CPU event and updates the file /sys/fs/cgroup/cpuset/docker/cpuset.cpus and every container using the command runc update. Currently, I have tried to enable the plugin. My solution avoid to add an additional flag and take into account the previous problem.

Do you find this solution acceptable or you see some drawbacks? Any help or feedbacks will be particularly appreciated :)

@cristiklein
Copy link

Hi all!
I needed a quick fix for this issue and I thought to share it here.

Create a file /etc/udev/rules.d/cpu-online.rules, which contains:

SUBSYSTEM=="cpu",ACTION=="add",RUN+="/bin/sh -c '[ ! -e /sys$devpath/online ] || echo 1 > /sys$devpath/online'"
SUBSYSTEM=="cpu",ACTION=="add",RUN+="/bin/sh -c 'cat /sys/fs/cgroup/cpuset/cpuset.cpus > /sys/fs/cgroup/cpuset/docker/cpuset.cpus; for c in /sys/fs/cgroup/cpuset/docker/*/cpuset.cpus; do cat /sys/fs/cgroup/cpuset/cpuset.cpus > $c; done'"

alicefr pushed a commit to alicefr/moby that referenced this issue Aug 21, 2017
When a cpu gets online, the docker daemon updates the cgroups of
its containers.

It resolves the issues:
	- moby#27453
	- opencontainers/runc#1119

The extention can be used with the flags --cgroup-parent and
--cpuset-cpus

Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
@ddingel ddingel closed this as completed Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants