-
Notifications
You must be signed in to change notification settings - Fork 15
panic: kernel BUG at net/core/skbuff.c:109! #1125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Likely related, but I've recently encountered other strange memory corruptions on Clang 12 (and no other compiler). Clang 12 @ 43d239d0fadb1f8ea297580ca39dfbee96c913c1 Sample trace:
Config: clang12-corruption.config.txt |
Is this reproducible? Sounds like yes, based on ssh? If you have a ramdisk, we can probably boot test this quickly in QEMU. Are you able to bisect LLVM commits to pinpoint the first bad commit? |
I'd suggest to reproduce the 2nd newer report -- no user interaction required, just booting the kernel to user space is sufficient.
Am bisecting, but this will take a while. Maybe I have something by tomorrow. |
This is what I get:
Also sadly it seems this found its way into Clang 11:
|
ok, next thing is to isolate the configs. |
with the above config on mainline, I observe the following warnings from objtool:
but the kernel boots just fine for me. I see now you mention linux-next. I get the same warnings from objtool when testing -next. Boots fine. It looks like your panic is in There were also two critical fixes to 4b0aa5724feaa89a9538dcab97e018110b0e4bc3 which will complicate a bisection. If you test after that landed but before the fixes, then it will seem bad, when it may be a separate issue already fixed. The fixes were:
I also have a pending fix that should be tested: Another thing to test; enabling assertions in LLVM then rebuild and see if anything trips. I'll try to isolate those objtool warnings, since they may be our smoking gun. |
(the objtool warnings I observe exist regardless of 4b0aa5724feaa89a9538dcab97e018110b0e4bc3. Forked #1169 to track those). |
I built an ASAN kernel and ran it through qemu. It had to sit for a bit, but it eventually panicked. |
Fairly certain it's in |
Sorry about the warnings, they might be harmless for what we're trying to debug (I just habitually enable CONFIG_DEBUG_ENTRY these days).
I'm using a syzkaller image, but not syzkaller itself.
Tested that on LLVM master branch and we still crash.
I've built with assertions, and nothing fires. The objtool warnings might be red herrings, but we shouldn't rule anything out. I'll try to do some config bisection, but the config I gave is already quite vanilla, just with a bunch of debug options enabled, specifically:
|
I'm going to be OOO until Saturday. Here are the LLVM IR files of Anyway, to reproduce:
|
Okay, one last thing. I did an experiment compiling |
This is very good information, thank you! Let's see if I can make something of that. |
Was that with @melver 's supplied config? I still haven't been able to reproduce.
Neat, how did you bisect them so quickly? It would be generally useful for use to be able to repeat the process of object file bisection.
$ opt -O2 slub.new.no-opt.ll.txt -o - -S > slub.new.opt.ll 😒 (that's too many instances to analyze, we need to get more specific about which function is problematic) Based on the suspected change, it should be something going wrong during |
@melver sent me a userspace image, I can now repro w/ $ qemu-system-x86_64 -kernel /android0/linux-next/arch/x86/boot/bzImage \
-append "console=ttyS0 root=/dev/sda debug earlyprintk=serial slub_debug=UZ" \
-nographic -smp 8 -m 32G -enable-kvm -cpu host -device virtio-scsi-pci,id=scsi \
-drive discard=unmap,file=debian-stretch.qcow2,if=none,id=hd0 -device scsi-hd,drive=hd0 |
The panic I observe with @melver 's STR, ie.
is reproducible at llvm commit c430c21202c377cfb9fce0e7272f7208d1e8a531, ie, before the big asm goto related changes. |
Some more experiments (llvm master branch): I found that changing inlining also affects if the corruptions happen or not. -O1 seems to only use the always-inliner (see clang/lib/CodeGen/BackendUtil.cpp:622) and when switching to that one even with -O2 makes the corruptions disappear. I tried to play with it a bit more, and with this patch (without which the below didn't work, as some other inlining heuristics seem to mess with it?)
, compiling slub.c with Not sure if that gets us closer, since inlining will also affect later optimizations; it might help minimize the IR diff. |
I think it's related to https://reviews.llvm.org/D86260. I'm going to check further. |
What I have so far: I tracked down the issue to The thing about |
Found it! This transformation in
|
Candidate fix here: https://reviews.llvm.org/D88823. Needs a testcase though. |
Configsdefconfig+SCSI_LOWLEVEL+VIRTIO_PCI+VIRTIO+SCSI_VIRTIO are the minimum set of configs needed to boot the provided userspace image. (Identifying those helps us bisect kernel configs, since those are a minimum). On top of those, to repro the failure:
All of those change based on (defconfig+ FixesPatching in https://reviews.llvm.org/D88823 does resolve the issue for me. RegressionThere was a comment earlier from @melver suggesting this was a regression, though I was not able to confirm. I see now the SHA I tested was incorrect.
78c69a00a4cff786e0ef13c895d0db309d6b3f42 is the first commit before the suspected Indeed, if I sync back to 78c69a00a4cf: 78c69a00a4cf last known good (So the earlier comment about me not being able to confirm the bisection result was an error on my part). I've filed https://llvm.org/pr47735 to block the clang-11 release (cc @zmodem) |
Tail duplication of a block with an INLINEASM_BR may result in a PHI node on the indirect branch. This is okay, but it also introduces a copy for that PHI node *after* the INLINEASM_BR, which is not okay. See: ClangBuiltLinux/linux#1125 Differential Revision: https://reviews.llvm.org/D88823
Tail duplication of a block with an INLINEASM_BR may result in a PHI node on the indirect branch. This is okay, but it also introduces a copy for that PHI node *after* the INLINEASM_BR, which is not okay. See: ClangBuiltLinux/linux#1125 Differential Revision: https://reviews.llvm.org/D88823 (cherry picked from commit d2c61d2)
Tail duplication of a block with an INLINEASM_BR may result in a PHI node on the indirect branch. This is okay, but it also introduces a copy for that PHI node *after* the INLINEASM_BR, which is not okay. See: ClangBuiltLinux/linux#1125 Differential Revision: https://reviews.llvm.org/D88823
Tail duplication of a block with an INLINEASM_BR may result in a PHI node on the indirect branch. This is okay, but it also introduces a copy for that PHI node *after* the INLINEASM_BR, which is not okay. See: ClangBuiltLinux/linux#1125 Differential Revision: https://reviews.llvm.org/D88823 (cherry picked from commit 36b3bf7)
Tail duplication of a block with an INLINEASM_BR may result in a PHI node on the indirect branch. This is okay, but it also introduces a copy for that PHI node *after* the INLINEASM_BR, which is not okay. See: ClangBuiltLinux/linux#1125 Differential Revision: https://reviews.llvm.org/D88823
Using a recent syzkaller config, we get the below panic on very recent Clang (0b90a08f7722980f6074c6eada8022242408cdb4). This issue does not exist in Clang 11 (no bisection attempted yet).
.config: bad.config.txt
steps to reproduce: 1) boot kernel; 2) try to ssh into VM or any other network-related activity.
The text was updated successfully, but these errors were encountered: