-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libct/cg/sd: use systemd version when generating device properties #3842
Conversation
@cyphar if the above is true, maybe systemd cgroup v2 driver should not add any eBPF rules at all, as they are not used anyway, instead relying on systemd to set those. Since in case of cgroup v2 we'll surely use sufficiently new systemd, there is no problem of inability to add a rule to systemd because of a non-existing device file. |
This should also fix #3708 (when a systemd >= v240 is used, that is). |
CI failure is a network glitch; CI restarted
|
Commit 343951a added a call to os.Stat for the device path when generating systemd device properties, to avoid systemd warning for non-existing devices. The idea was, since systemd uses stat(2) to look up device properties for a given path, it will fail anyway. In addition, this allowed to suppress a warning like this from systemd: > Couldn't stat device /dev/char/10:200 NOTE that this was done because: - systemd could not add the rule anyway; - runs puts its own set of rules on top of what systemd does. Apparently, the above change broke some setups, resulting in inability to use e.g. /dev/null inside a container. My guess is this is because in cgroup v2 we add a second eBPF program, which is not used if the first one (added by systemd) returns "access denied". Next, commit 3b95828 fixed that by adding a call to os.Stat for "/sys/"+path (meaning, if "/dev/char/10:200" does not exist, we retry with "/sys/dev/char/10:200", and if it exists, proceed with adding a device rule with the original (non-"/sys") path). How that second fix ever worked was a mystery, because the path we gave to systemd still doesn't exist. Well, I think now I know. Since systemd v240 (commit 74c48bf5a8005f20) device access rules specified as /dev/{block|char}/MM:mm are no longer looked up on the filesystem, instead, if possible, those are parsed from the string. So, we need to do different things, depending on systemd version: - for systemd >= v240, use the /dev/{char,block}/MM:mm as is, without doing stat() -- since systemd doesn't do stat() either; - for older version, check if the path exists, and skip passing it on to systemd otherwise. - the check for /sys/dev/{block,char}/MM:mm is not needed in either case. Pass the systemd version to the function that generates the rules, and fix it accordingly. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
LGTM, fwiw. |
Where is the second eBPF program added? I met an issue related to the second eBPF program, the details please refer to NVIDIA/nvidia-container-toolkit#192 |
This is a continuation of a saga that started in PR #3498 and continued in PR #3623.
Commit 343951a added a call to os.Stat for the device path when generating systemd device properties, to avoid systemd warning for non-existing devices. The idea was, since systemd uses stat(2) to look up device properties for a given path, it will fail anyway. In addition, this allowed to suppress a warning like this from systemd:
NOTE that this was done because:
Apparently, the above change broke some setups, resulting in inability to use e.g. /dev/null inside a container. My guess is this is because in cgroup v2 we add a second eBPF program, which is not used if the first one (added by systemd) returns "access denied".
Next, commit 3b95828 fixed that by adding a call to os.Stat for "/sys/"+path (meaning, if "/dev/char/10:200" does not exist, we retry with "/sys/dev/char/10:200", and if it exists, proceed with adding a device rule with the original (non-"/sys") path).
How that second fix ever worked was a mystery, because the path we gave to systemd still doesn't exist.
Well, I think now I know.
Since systemd v240 (systemd commit 74c48bf5a8005f20) device access rules specified as /dev/{block|char}/MM:mm are no longer looked up on the filesystem, instead, if possible, those are parsed from the string.
So, we need to do different things, depending on systemd version:
Pass the systemd version to the function that generates the rules, and fix it accordingly.