-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroup: Use kernel command line to disable memory cgroup #6439
cgroup: Use kernel command line to disable memory cgroup #6439
Conversation
Hi Maira, Thanks for this. Your patches missed out a few platforms because of the non-trivial include hierarchy. You may find it easier to build on top of #6442, which is effectively a cosmetic change that reduces the number of files to change to 6:
|
#6442 is merged, so over to you. |
Commit 94a23e9 ("cgroup: Disable cgroup "memory" by default") disabled the memory cgroup by default when initing the cgroups. However, it's possible to disable the memory cgroup by a kernel command line. Hard-coding such a feature can be problematic as some memory management features depend on the order that things are set. For example, it is possible to see a NULL pointer dereference caused by commit 94a23e9. The NULL pointer dereference is triggered by the memory shrinker and ends up in a kernel crash. [ 50.028629] ================================================================== [ 50.028645] BUG: KASAN: null-ptr-deref in do_shrink_slab+0x1fc/0x978 [ 50.028663] Write of size 8 at addr 0000000000000000 by task gfxrecon-replay/1965 [ 50.028676] CPU: 3 UID: 1000 PID: 1965 Comm: gfxrecon-replay Tainted: G C 6.12.0-rc4-v8-thp-kasan+ raspberrypi#85 [ 50.028685] Tainted: [C]=CRAP [ 50.028689] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT) [ 50.028694] Call trace: [ 50.028697] dump_backtrace+0xfc/0x120 [ 50.028706] show_stack+0x24/0x38 [ 50.028711] dump_stack_lvl+0x40/0x88 [ 50.028720] print_report+0xe4/0x708 [ 50.028728] kasan_report+0xcc/0x130 [ 50.028733] kasan_check_range+0x254/0x298 [ 50.028738] __kasan_check_write+0x20/0x30 [ 50.028745] do_shrink_slab+0x1fc/0x978 [ 50.028751] shrink_slab+0x318/0xc38 [ 50.028756] shrink_one+0x254/0x6d8 [ 50.028762] shrink_node+0x26b4/0x2848 [ 50.028767] do_try_to_free_pages+0x3e4/0x1190 [ 50.028773] try_to_free_pages+0x5a4/0xb40 [ 50.028778] __alloc_pages_direct_reclaim+0x144/0x298 [ 50.028787] __alloc_pages_slowpath+0x5c4/0xc70 [ 50.028793] __alloc_pages_noprof+0x4a8/0x6a8 [ 50.028800] __folio_alloc_noprof+0x24/0xa8 [ 50.028806] shmem_alloc_and_add_folio+0x2ec/0xce0 [ 50.028812] shmem_get_folio_gfp+0x380/0xc20 [ 50.028818] shmem_read_folio_gfp+0xe0/0x160 [ 50.028824] drm_gem_get_pages+0x238/0x620 [drm] [ 50.029039] drm_gem_shmem_get_pages_sgt+0xd8/0x4b8 [drm_shmem_helper] [ 50.029053] v3d_bo_create_finish+0x58/0x1e0 [v3d] [ 50.029083] v3d_create_bo_ioctl+0xac/0x210 [v3d] [ 50.029105] drm_ioctl_kernel+0x1d8/0x2b8 [drm] [ 50.029220] drm_ioctl+0x4b4/0x920 [drm] [ 50.029330] __arm64_sys_ioctl+0x11c/0x160 [ 50.029337] invoke_syscall+0x88/0x268 [ 50.029345] el0_svc_common+0x160/0x1d8 [ 50.029351] do_el0_svc+0x50/0x68 [ 50.029358] el0_svc+0x34/0x80 [ 50.029364] el0t_64_sync_handler+0x84/0x100 [ 50.029371] el0t_64_sync+0x190/0x198 [ 50.029376] ================================================================== This happens because the memory shrinker is unaware that we are artificially disabling the memory cgroups and therefore it doesn't allocate `nr_deferred` (as it would if we used the kernel command line). To avoid such an issue, revert the artificial disablement and disable it through the command line. If a user wants to enable the feature, it can use the `cgroup_enable=` command line. Signed-off-by: Maíra Canal <mcanal@igalia.com>
0974ecb
to
a870565
Compare
I rebased and updated the DTS files. |
Thanks! |
kernel: drivers: media: bcm2835_isp: Cache LS table dmabuf See: raspberrypi/linux#6429 kernel: arm64: dts: Sort out CM5 and I/O board I2C ports See: raspberrypi/linux#6441 kernel: cgroup: Use kernel command line to disable memory cgroup See: raspberrypi/linux#6439
kernel: drivers: media: bcm2835_isp: Cache LS table dmabuf See: raspberrypi/linux#6429 kernel: arm64: dts: Sort out CM5 and I/O board I2C ports See: raspberrypi/linux#6441 kernel: cgroup: Use kernel command line to disable memory cgroup See: raspberrypi/linux#6439
As far as I understand this change correctly, I should be able to reenable the cgroups memory controller by adding cgroup_enable=memory to the cmdline.txt. However, although I have set that, the cgroups memory controller is still not enabled:
System is a RPi 5 with Arch Linux Arm and the most recent version of the linux-rpi kernel: |
@dkadioglu Please note that Also, note that commit 94a23e9 ("cgroup: Disable cgroup "memory" by default") doesn't exist in the rpi-6.12.y tree. This means that |
I think that's a bit of a waste of effort - we'll sort it out. |
Do you really need the V1 memory cgroup support? I have a patch that re-enables the memory cgroup (see #6524), but if you look in /proc/cgroups you'll find that it is absent. However, /sys/fs/cgroup/ includes If you were an RPiOS user I'd suggest you wait about 35 minutes for the auto-builds to complete, then run If you don't know for certain whether the legacy V1 memory cgroup support is needed, perhaps you could say what your use case is so we can test it here. |
Actually, I am not sure, if I need V1 memory cgroup support. My use case is running docker containers with resource limitations, for example:
However, with the kernels 6.6.x I needed cgroup_enable=memory to run a container with such resource limits without error. Now, with 6.12 I see again 'Your kernel does not support memory soft limit capabilities or the cgroup is not mounted. Limitation discarded.', although I have cgroup_enable=memory in the cmdline. Is maybe Docker the culprit here, that they still use V1 cgroups? Or do I miss anything else here? When I run
The only memory related entry seems to be memory.pressure. Why do you see more here? |
I'm pretty sure that you'll need my patch - the question is whether you also need |
At least on my current kernel it seems to be enabled, zgrep 'CONFIG_MEMCG' /proc/config.gz says:
|
OK - in that case I'll just merge the PR, and eventually it will appear in an Arch kernel build. |
Thank you very much! |
There's no evidence to support that. In order to find out, you'd need a kernel without CONFIG_MEMCG_V1 and with my patch.
Yes, but with my patch you'll be able to override that with cgroup_enable=memory in cmdline.txt. |
I can confirm that the patch fixed it for me, thank you very much for your immediate support!
|
Commit 94a23e9 ("cgroup: Disable cgroup "memory" by default") disabled the memory cgroup by default when initing the cgroups. However, it's possible to disable the memory cgroup by a kernel command line. Hard-coding such a feature can be problematic as some memory management features depend on the order that things are set.
For example, it is possible to see a NULL pointer dereference caused by commit 94a23e9. The NULL pointer dereference is triggered by the memory shrinker and ends up in a kernel crash.
[ 50.028629] ==================================================================
[ 50.028645] BUG: KASAN: null-ptr-deref in do_shrink_slab+0x1fc/0x978
[ 50.028663] Write of size 8 at addr 0000000000000000 by task gfxrecon-replay/1965
[ 50.028676] CPU: 3 UID: 1000 PID: 1965 Comm: gfxrecon-replay Tainted: G C 6.12.0-rc4-v8-thp-kasan+ #85
[ 50.028685] Tainted: [C]=CRAP
[ 50.028689] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[ 50.028694] Call trace:
[ 50.028697] dump_backtrace+0xfc/0x120
[ 50.028706] show_stack+0x24/0x38
[ 50.028711] dump_stack_lvl+0x40/0x88
[ 50.028720] print_report+0xe4/0x708
[ 50.028728] kasan_report+0xcc/0x130
[ 50.028733] kasan_check_range+0x254/0x298
[ 50.028738] __kasan_check_write+0x20/0x30
[ 50.028745] do_shrink_slab+0x1fc/0x978
[ 50.028751] shrink_slab+0x318/0xc38
[ 50.028756] shrink_one+0x254/0x6d8
[ 50.028762] shrink_node+0x26b4/0x2848
[ 50.028767] do_try_to_free_pages+0x3e4/0x1190
[ 50.028773] try_to_free_pages+0x5a4/0xb40
[ 50.028778] __alloc_pages_direct_reclaim+0x144/0x298
[ 50.028787] __alloc_pages_slowpath+0x5c4/0xc70
[ 50.028793] __alloc_pages_noprof+0x4a8/0x6a8
[ 50.028800] __folio_alloc_noprof+0x24/0xa8
[ 50.028806] shmem_alloc_and_add_folio+0x2ec/0xce0
[ 50.028812] shmem_get_folio_gfp+0x380/0xc20
[ 50.028818] shmem_read_folio_gfp+0xe0/0x160
[ 50.028824] drm_gem_get_pages+0x238/0x620 [drm]
[ 50.029039] drm_gem_shmem_get_pages_sgt+0xd8/0x4b8 [drm_shmem_helper]
[ 50.029053] v3d_bo_create_finish+0x58/0x1e0 [v3d]
[ 50.029083] v3d_create_bo_ioctl+0xac/0x210 [v3d]
[ 50.029105] drm_ioctl_kernel+0x1d8/0x2b8 [drm]
[ 50.029220] drm_ioctl+0x4b4/0x920 [drm]
[ 50.029330] __arm64_sys_ioctl+0x11c/0x160
[ 50.029337] invoke_syscall+0x88/0x268
[ 50.029345] el0_svc_common+0x160/0x1d8
[ 50.029351] do_el0_svc+0x50/0x68
[ 50.029358] el0_svc+0x34/0x80
[ 50.029364] el0t_64_sync_handler+0x84/0x100
[ 50.029371] el0t_64_sync+0x190/0x198
[ 50.029376] ==================================================================
This happens because the memory shrinker is unaware that we are artificially disabling the memory cgroups and therefore it doesn't allocate
nr_deferred
(as it would if we used the kernel command line).To avoid such an issue, revert the artificial disablement and disable it through the command line. If a user wants to enable the feature, it can use the
cgroup_enable=
command line.