-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oom-killer gets triggered when there is plenty of memory still available #3872
Comments
I have this issue on ArchlinuxARM on a RPi 4GB. Happened with 5.4.x and still happens with 6.1.35-4-rpi-ARCH (armv7l) today. The issue is unrelated to #5395. Installing packages via pacman triggers this very reliably during the download phase. This manifests as downloads first proceeding at high speed, then suddenly getting stuck. oom-killer then goes on a rampage and unless the package install is canceled quickly enough, takes the whole system down:
|
Commit 7675076 ("arch_numa: switch over to numa_memblks") significantly cleaned up the NUMA registration code, but also dropped a significant check that was refusing to accept to configure a memblock with an invalid nid. On "quality hardware" such as my ThunderX machine, this results in a kernel that dies immediately: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0a10] [ 0.000000] Linux version 6.12.0-00013-g8920d74cf8db (maz@valley-girl) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #3872 SMP PREEMPT Wed Nov 27 15:25:49 GMT 2024 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: Cavium ThunderX CN88XX board [ 0.000000] efi: EFI v2.4 by American Megatrends [ 0.000000] efi: ESRT=0xffce0ff18 SMBIOS 3.0=0xfffb0000 ACPI 2.0=0xffec60000 MEMRESERVE=0xffc905d98 [ 0.000000] esrt: Reserving ESRT space from 0x0000000ffce0ff18 to 0x0000000ffce0ff50. [ 0.000000] earlycon: pl11 at MMIO 0x000087e024000000 (options '115200n8') [ 0.000000] printk: legacy bootconsole [pl11] enabled [ 0.000000] NODE_DATA(0) allocated [mem 0xff6754580-0xff67566bf] [ 0.000000] Unable to handle kernel paging request at virtual address 0000000000001d40 [ 0.000000] Mem abort info: [ 0.000000] ESR = 0x0000000096000004 [ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.000000] SET = 0, FnV = 0 [ 0.000000] EA = 0, S1PTW = 0 [ 0.000000] FSC = 0x04: level 0 translation fault [ 0.000000] Data abort info: [ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 0.000000] [0000000000001d40] user address but active_mm is swapper [ 0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.12.0-00013-g8920d74cf8db #3872 [ 0.000000] Hardware name: Cavium ThunderX CN88XX board (DT) [ 0.000000] pstate: a00000c5 (NzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.000000] pc : sparse_init_nid+0x54/0x428 [ 0.000000] lr : sparse_init+0x118/0x240 [ 0.000000] sp : ffff800081da3cb0 [ 0.000000] x29: ffff800081da3cb0 x28: 0000000fedbab10c x27: 0000000000000001 [ 0.000000] x26: 0000000ffee250f8 x25: 0000000000000001 x24: ffff800082102cd0 [ 0.000000] x23: 0000000000000001 x22: 0000000000000000 x21: 00000000001fffff [ 0.000000] x20: 0000000000000001 x19: 0000000000000000 x18: ffffffffffffffff [ 0.000000] x17: 0000000001b00000 x16: 0000000ffd130000 x15: 0000000000000000 [ 0.000000] x14: 00000000003e0000 x13: 00000000000001c8 x12: 0000000000000014 [ 0.000000] x11: ffff800081e82860 x10: ffff8000820fb2c8 x9 : ffff8000820fb490 [ 0.000000] x8 : 0000000000ffed20 x7 : 0000000000000014 x6 : 00000000001fffff [ 0.000000] x5 : 00000000ffffffff x4 : 0000000000000000 x3 : 0000000000000000 [ 0.000000] x2 : 0000000000000000 x1 : 0000000000000040 x0 : 0000000000000007 [ 0.000000] Call trace: [ 0.000000] sparse_init_nid+0x54/0x428 [ 0.000000] sparse_init+0x118/0x240 [ 0.000000] bootmem_init+0x70/0x1c8 [ 0.000000] setup_arch+0x184/0x270 [ 0.000000] start_kernel+0x74/0x670 [ 0.000000] __primary_switched+0x80/0x90 [ 0.000000] Code: f865d804 d37df060 cb030000 d2800003 (b95d4084) [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- while previous kernel versions were able to recognise how brain-damaged the machine is, and only build a fake node. Use the memblock_validate_numa_coverage() helper to restore some sanity and a "working" system. Fixes: 7675076 ("arch_numa: switch over to numa_memblks") Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241201092702.3792845-1-maz@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
commit 180bbad upstream. Commit 7675076 ("arch_numa: switch over to numa_memblks") significantly cleaned up the NUMA registration code, but also dropped a significant check that was refusing to accept to configure a memblock with an invalid nid. On "quality hardware" such as my ThunderX machine, this results in a kernel that dies immediately: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0a10] [ 0.000000] Linux version 6.12.0-00013-g8920d74cf8db (maz@valley-girl) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #3872 SMP PREEMPT Wed Nov 27 15:25:49 GMT 2024 [ 0.000000] KASLR disabled due to lack of seed [ 0.000000] Machine model: Cavium ThunderX CN88XX board [ 0.000000] efi: EFI v2.4 by American Megatrends [ 0.000000] efi: ESRT=0xffce0ff18 SMBIOS 3.0=0xfffb0000 ACPI 2.0=0xffec60000 MEMRESERVE=0xffc905d98 [ 0.000000] esrt: Reserving ESRT space from 0x0000000ffce0ff18 to 0x0000000ffce0ff50. [ 0.000000] earlycon: pl11 at MMIO 0x000087e024000000 (options '115200n8') [ 0.000000] printk: legacy bootconsole [pl11] enabled [ 0.000000] NODE_DATA(0) allocated [mem 0xff6754580-0xff67566bf] [ 0.000000] Unable to handle kernel paging request at virtual address 0000000000001d40 [ 0.000000] Mem abort info: [ 0.000000] ESR = 0x0000000096000004 [ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.000000] SET = 0, FnV = 0 [ 0.000000] EA = 0, S1PTW = 0 [ 0.000000] FSC = 0x04: level 0 translation fault [ 0.000000] Data abort info: [ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 0.000000] [0000000000001d40] user address but active_mm is swapper [ 0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.12.0-00013-g8920d74cf8db #3872 [ 0.000000] Hardware name: Cavium ThunderX CN88XX board (DT) [ 0.000000] pstate: a00000c5 (NzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.000000] pc : sparse_init_nid+0x54/0x428 [ 0.000000] lr : sparse_init+0x118/0x240 [ 0.000000] sp : ffff800081da3cb0 [ 0.000000] x29: ffff800081da3cb0 x28: 0000000fedbab10c x27: 0000000000000001 [ 0.000000] x26: 0000000ffee250f8 x25: 0000000000000001 x24: ffff800082102cd0 [ 0.000000] x23: 0000000000000001 x22: 0000000000000000 x21: 00000000001fffff [ 0.000000] x20: 0000000000000001 x19: 0000000000000000 x18: ffffffffffffffff [ 0.000000] x17: 0000000001b00000 x16: 0000000ffd130000 x15: 0000000000000000 [ 0.000000] x14: 00000000003e0000 x13: 00000000000001c8 x12: 0000000000000014 [ 0.000000] x11: ffff800081e82860 x10: ffff8000820fb2c8 x9 : ffff8000820fb490 [ 0.000000] x8 : 0000000000ffed20 x7 : 0000000000000014 x6 : 00000000001fffff [ 0.000000] x5 : 00000000ffffffff x4 : 0000000000000000 x3 : 0000000000000000 [ 0.000000] x2 : 0000000000000000 x1 : 0000000000000040 x0 : 0000000000000007 [ 0.000000] Call trace: [ 0.000000] sparse_init_nid+0x54/0x428 [ 0.000000] sparse_init+0x118/0x240 [ 0.000000] bootmem_init+0x70/0x1c8 [ 0.000000] setup_arch+0x184/0x270 [ 0.000000] start_kernel+0x74/0x670 [ 0.000000] __primary_switched+0x80/0x90 [ 0.000000] Code: f865d804 d37df060 cb030000 d2800003 (b95d4084) [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- while previous kernel versions were able to recognise how brain-damaged the machine is, and only build a fake node. Use the memblock_validate_numa_coverage() helper to restore some sanity and a "working" system. Fixes: 7675076 ("arch_numa: switch over to numa_memblks") Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241201092702.3792845-1-maz@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using this machine as a torrent downloader with Transmission, Sonarr, Radarr, Jackett, etc. I know these programs are heavy on RAM, but oom-killer gets triggered even when memory usage is lower than 1,5 or 1GB. Tried using a 2GB swapfile (which never gets too used) without solving the issue. Every 8-12 hours OOM reaper kills all download managing programs, even Transmission.
To reproduce
Not sure, probably adding some memory hungry program helps reproduce.
Expected behaviour
oom-killer should only trigger when memory and swap are really scarce..
Actual behaviour
oom-killer gets triggered even when memory usage is low, even < 1GB.
System
cat /etc/rpi-issue
)? Raspbian GNU/Linux 10vcgencmd version
)? version 4439d2aaa6c376a2d1ef4402f142e1cf4de37c43 (clean) (release) (start)uname -a
)? 5.4.51-v7l+Logs
dmesg.txt
The text was updated successfully, but these errors were encountered: