Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dropping of cgroup v1 #1642

Open
skshandilya opened this issue Feb 11, 2025 · 8 comments
Open

dropping of cgroup v1 #1642

skshandilya opened this issue Feb 11, 2025 · 8 comments
Labels
area/kernel Issues related to kernel kind/bug Something isn't working

Comments

@skshandilya
Copy link

Description

This is not a feature request or a bug it is just a request for suggestions
flatcar is dropping support for cgroup V1 from march release

Impact

[ 1 sentence detailing the impact this bug is creating for you ]
we have some media workloads running and using realtime scheduling and priority
we run in sched_rr and sched_fifo and modify priority dynamically in our workload

we used this https://docs.docker.com/engine/containers/resource_constraints/#configure-the-host-machines-kernel
to configure our setup to run on cgroup v1

When we move to cgroup V2 we cannot change the priority class to any of the realtime classes
We have the problem explained below when we try to allocate some time for V2
our media workloads run in sched_other and performance is not as expected

Environment and steps to reproduce

On V2 we get this when we try to run

mfusion@citg-plat39 ~ $ docker run -it --rm --cpu-rt-runtime 700000 f13d bash
docker: Error response from daemon: Your kernel does not support CPU real-time scheduler.
See 'docker run --help'.

Expected behavior

it should continue to work

Additional information

we also read that the kernel does not support realtime group scheduling yet, when is this expected?
some kind of work around or alternate arrangements could help

@chewi
Copy link
Contributor

chewi commented Feb 12, 2025

I'm not an expert on scheduling, but do you know if sched_ext would help, possibly with the scx_flash scheduler? It says it is good for real-time workloads. I happened to set up sched_ext on my desktop yesterday. It requires kernel 6.12, so we can't have it right now, but we could have it soon.

@jepio
Copy link
Member

jepio commented Feb 13, 2025

we also read that the kernel does not support realtime group scheduling yet, when is this expected?

I haven't heard of anyone working on this upstream.

some kind of work around or alternate arrangements could help

You can run a realtime workload on cgroupv2 if the process is in the cgroup root. You can do that with docker if you really want, but it's not pretty:

$ docker run -it --privileged --cgroupns host debian:12
# # inside container
# cd /sys/fs/cgroup
# echo $$ >cgroups.procs
# chrt -f 99 sleep 1

@skshandilya
Copy link
Author

skshandilya commented Feb 13, 2025 via email

@skshandilya
Copy link
Author

skshandilya commented Feb 13, 2025 via email

@chewi
Copy link
Contributor

chewi commented Feb 13, 2025

This is an interesting option, Is 6.12 going to long term support?

I'm not the best person to ask, but given that our current LTS is on 6.6 and the next upstream LTS will be 6.12, I expect we'd take 6.12 for our next LTS.

I'd forgotten that the BPF-based sched_ext schedulers unfortunately require some debug symbols. This can massively increase the size of the kernel and modules, although I'm not yet sure whether you need this on all the modules or just the kernel itself. This might prevent us from supporting this in the main image, but perhaps we could put the bigger modules in a sysext.

@skshandilya
Copy link
Author

skshandilya commented Feb 14, 2025

I tested this on fedora 41 and seems to work without any hiccups
My Colleague also tested this on an older version of arch linux with a a 6.1 kernel and it works there too

Isandeep@fedora:~$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="41 (Workstation Edition)"
RELEASE_TYPE=stable
ID=fedora
VERSION_ID=41
VERSION_CODENAME=""
PLATFORM_ID="platform:f41"
PRETTY_NAME="Fedora Linux 41 (Workstation Edition)"
....

sandeep@fedora:~$ docker system info | grep -i Cgroup
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Cgroup Driver: systemd
Cgroup Version: 2
....

sandeep@fedora:~$ docker run --rm -it --cap-add=sys_nice ubuntu:jammy bash
root@f07ef860498e:/# ps -cf
UID PID PPID CLS PRI STIME TTY TIME CMD
root 1 0 TS 19 19:42 pts/0 00:00:00 bash
root 9 1 TS 19 19:42 pts/0 00:00:00 ps -cf
root@f07ef860498e:/# chrt -rp 50 1
root@f07ef860498e:/# ps -cf
UID PID PPID CLS PRI STIME TTY TIME CMD
root 1 0 RR 90 19:42 pts/0 00:00:00 bash
root 11 1 RR 90 19:42 pts/0 00:00:00 ps -cf
root@7ef5a1cc8428:/# ps -cf
UID PID PPID CLS PRI STIME TTY TIME CMD
root 1 0 RR 70 19:59 pts/0 00:00:00 bash
root 14 1 RR 70 20:01 pts/0 00:00:00 ps -cf
root@7ef5a1cc8428:/# chrt -rp 50 1
root@7ef5a1cc8428:/# ps -cf
UID PID PPID CLS PRI STIME TTY TIME CMD
root 1 0 RR 90 19:59 pts/0 00:00:00 bash
root 16 1 RR 90 20:01 pts/0 00:00:00 ps -cf
root@7ef5a1cc8428:/# chrt -rp 20 1
root@7ef5a1cc8428:/# ps -cf
UID PID PPID CLS PRI STIME TTY TIME CMD
root 1 0 RR 60 19:59 pts/0 00:00:00 bash
root 18 1 RR 60 20:01 pts/0 00:00:00 ps -cf

I can see the the priority is not changing as expected but that may because the kernel may be doing something to lower the priority because of some other reason, but the sched class is changing to RR and the values are changing in ps -cf output

What are we missing here?

@chewi
Copy link
Contributor

chewi commented Feb 19, 2025

I don't know about the above, but I can at least report that sched_ext only requires the debug symbols at build time. You can strip them before installing. We may therefore still be able to offer that.

@skshandilya
Copy link
Author

skshandilya commented Feb 21, 2025

Another observation we made, you do not need the kernel configuration CONFIG_RT_GROUP_SCHED when we move to CGROUP V2, we are seeing this enabled in the kernel in flatcar version 4230.0.1. The other options that are in Fedora 39, kernel version 6.11.9 are as follows

CONFIG_SCHED_CORE=y
CONFIG_HAVE_SCHED_AVG_IRQ=y
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED is not set
CONFIG_SCHED_MM_CID=y
CONFIG_SCHED_AUTOGROUP=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_SCHED_CLUSTER=y
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_MC_PRIO=y
CONFIG_SCHED_HRTICK=y
CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
CONFIG_NET_SCHED=y
CONFIG_DRM_SCHED=m
CONFIG_DRM_XE_ENABLE_SCHEDTIMEOUT_LIMIT=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
CONFIG_SCHED_TRACER=y

since we are moving to CGROUP V2 it would make sense to enable these, yes we understand that realtime scheduling can break the box if invalid values of priority are set but there is safe guard provided by the kernel
as long as /proc/sys/kernel/sched_rt_runtime_us(default 950000) < /proc/sys/kernel/sched_rt_period_us(default 1000000) we are safe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kernel Issues related to kernel kind/bug Something isn't working
Projects
Status: 📝 Needs Triage
Development

No branches or pull requests

4 participants