-
Notifications
You must be signed in to change notification settings - Fork 29
Crash in scx_cgroup_can_attach() when testing scx_lavd #164
Comments
This is where the warning is being triggered: https://github.com/sched-ext/sched_ext/blob/sched_ext/kernel/sched/ext.c#L2714-L2726. So for some reason we seem to have already started to try to move that task between cgroups. @htejun something I'm not quite understanding from reading the code. In scx_cgroup_can_attach() we clean up and drop the scx_cgroup_rwsem here: https://github.com/sched-ext/sched_ext/blob/sched_ext/kernel/sched/ext.c#L2730-L2741, but then we clean up and drop the lock again in scx_cgroup_cancel_attach(): https://github.com/sched-ext/sched_ext/blob/sched_ext/kernel/sched/ext.c#L2784-L2793. From looking at cgroup_migrate_execute(), it looks like we we'd hit the cancel path and double free the lock. What am I missing? |
Ah, nevermind, we bail out when we get to the subsys where the failure happened before we call into the cancel callback. |
👋 After trying to find a way to repro this issue consistently, I found:
|
I tried to reproduce the issue locally but haven't been successful. I'm traveling this week so my testing is rather limited. @SoulHarsh007, any chance you can apply the following patch to the kernel and see whether either of the printks trigger? Please note that the dump is a
|
👋 log after this patch: https://paste.soulharsh007.dev/p/6a2065a.log Edit: after exiting the container (systemd-nspawn), these messages are printed: https://paste.soulharsh007.dev/p/b580443.log |
Ah, thanks, yeah, that makes sense. Let me study the code further to understand why that's happening but the warning is triggering spuriously and the resolution most likely is just removing it. |
Can you please test whether this patch resolves the issue? Thanks.
|
👋 I applied #165 (thanks to @sirlucjan for helping me with the patches) and the error does not seem to appear anymore. I have tested it multiple times and can confirm the error message is now gone! |
Fix: sched-ext/sched_ext#164 Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
Thank you so much for confirming the fix. Also, thanks @sirlucjan for the help. |
Fix is now available in https://github.com/sched-ext/scx-kernel-releases/releases/tag/v6.8.0-scx2. |
As described in sched-ext/scx#192 (comment), @SoulHarsh007 managed to trigger a warning in the kernel when testing
scx_lavd
. In his words:👋 I am not sure if this is the right place to report this, but I find the following kernel oops when using scx_lavd: https://paste.soulharsh007.dev/p/43a4476.log
Kernel Version: 6.8.1-2-cachyos
Let's track fixing it here.
The text was updated successfully, but these errors were encountered: