-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: kernel NULL pointer dereference, address: 0000000000000008 #10642
Comments
I believe I have the same bug -- also triggered by Plex Media Server (consistently, every few weeks; the server runs a lot more than just Plex but the crash is always triggered by Plex).
Apart from running the same Ubuntu version and kernel (I've experienced the crash on several different 5.4.0-generic Ubuntu builds), my environment is not particularly similar to the original reporter's: not a VM, just Ubuntu 20.04 on bare metal; Xeon E5-2407 CPU; LSI SAS2308 HBA. |
@mas90 is that also zfs 0.8.3 ? |
Yes - 0.8.3-1ubuntu12.4. |
I have the same crash running Linux 5.4.0-58-generic #64-Ubuntu SMP, zfs 0.8.3-1ubuntu12.5 and with a Plex workload. Although for me it causes the server to stop responding to network traffic. |
We had the same issue:
|
I am seeing the same thing here. Sporadic misbehaviour on one of my LXD containers that are backed by a very trivial ZFS pool that runs Plex. Every couple of weeks, load increases though no specific process can be blamed for the load. Unable to manipulate the container in anyway. Reboots take forever due to issues shutting down the ZFS pool I have. From
ZFS pool info:
Version info:
Related thread in the LXD forums - https://discuss.linuxcontainers.org/t/single-lxd-container-gets-in-a-bad-state-need-a-host-restart-to-resolve/11751 |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
I also use ZFS and a Plex server, I was looking at my email alerts from a few days ago: It automatically restarted, I didn't notice the downtime but I got the alert # journalctl --since "2024-07-03 02:20:24"
jul 03 02:20:24 bserver kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
jul 03 02:20:24 bserver kernel: #PF: supervisor write access in kernel mode
jul 03 02:20:24 bserver kernel: #PF: error_code(0x0002) - not-present page
jul 03 02:20:24 bserver kernel: PGD 0 P4D 0
jul 03 02:20:24 bserver kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
jul 03 02:20:24 bserver kernel: CPU: 0 PID: 2559 Comm: CPU 0/KVM Tainted: P O 6.5.0-41-generic #41~22.04.2-Ubuntu
jul 03 02:20:24 bserver kernel: Hardware name: BIOSTAR Group B450MH/B450MH, BIOS 5.17 03/04/2024
jul 03 02:20:24 bserver kernel: RIP: 0010:_raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel: Code: 31 db e9 00 18 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 8c 12 ef 51 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 1b 31 c0 31 d2 31 c9 31>
jul 03 02:20:24 bserver kernel: RSP: 0018:ffffadf986603c90 EFLAGS: 00010046
jul 03 02:20:24 bserver kernel: RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
jul 03 02:20:24 bserver kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000008
jul 03 02:20:24 bserver kernel: RBP: ffffadf986603cc8 R08: 0000000000000000 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000092 R14: ffff9730f101fb50 R15: 0226800000000000
jul 03 02:20:24 bserver kernel: FS: 00007e293e071640(0000) GS:ffff9730f1000000(0000) knlGS:0000000000000000
jul 03 02:20:24 bserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008 CR3: 000000013e782000 CR4: 00000000003506f0
jul 03 02:20:24 bserver kernel: Call Trace:
jul 03 02:20:24 bserver kernel: <TASK>
jul 03 02:20:24 bserver kernel: ? show_regs+0x6d/0x80
jul 03 02:20:24 bserver kernel: ? __die+0x24/0x80
jul 03 02:20:24 bserver kernel: ? page_fault_oops+0x99/0x1b0
jul 03 02:20:24 bserver kernel: ? do_user_addr_fault+0x31d/0x6b0
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? kvm_set_lapic_tscdeadline_msr+0x50/0xa0 [kvm]
jul 03 02:20:24 bserver kernel: ? exc_page_fault+0x83/0x1b0
jul 03 02:20:24 bserver kernel: ? asm_exc_page_fault+0x27/0x30
jul 03 02:20:24 bserver kernel: ? _raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel: ? speculation_ctrl_update+0xda/0x1e0
jul 03 02:20:24 bserver kernel: x86_virt_spec_ctrl+0x61/0x70
jul 03 02:20:24 bserver kernel: svm_vcpu_run+0x59a/0x860 [kvm_amd]
jul 03 02:20:24 bserver kernel: vcpu_enter_guest+0x456/0xf00 [kvm]
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? kvm_apic_local_deliver+0xa1/0xd0 [kvm]
jul 03 02:20:24 bserver kernel: vcpu_run+0x46/0x290 [kvm]
jul 03 02:20:24 bserver kernel: kvm_arch_vcpu_ioctl_run+0x1d4/0x590 [kvm]
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? rseq_ip_fixup+0x90/0x1f0
jul 03 02:20:24 bserver kernel: kvm_vcpu_ioctl+0x297/0x800 [kvm]
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? __seccomp_filter+0x37b/0x560
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? __fget_light+0xa5/0x120
jul 03 02:20:24 bserver kernel: __x64_sys_ioctl+0xa3/0xf0
jul 03 02:20:24 bserver kernel: x64_sys_call+0x1198/0x20b0
jul 03 02:20:24 bserver kernel: do_syscall_64+0x55/0x90
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? do_syscall_64+0x61/0x90
jul 03 02:20:24 bserver kernel: ? common_interrupt+0x54/0xb0
jul 03 02:20:24 bserver kernel: entry_SYSCALL_64_after_hwframe+0x73/0xdd
jul 03 02:20:24 bserver kernel: RIP: 0033:0x7e2945af194f
jul 03 02:20:24 bserver kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44>
jul 03 02:20:24 bserver kernel: RSP: 002b:00007e293e070460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
jul 03 02:20:24 bserver kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007e2945af194f
jul 03 02:20:24 bserver kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
jul 03 02:20:24 bserver kernel: RBP: 00005dd2075f2b30 R08: 00005dd204e43f10 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000001 R14: 000000000000c040 R15: 0000000000000000
jul 03 02:20:24 bserver kernel: </TASK>
jul 03 02:20:24 bserver kernel: Modules linked in: macvtap macvlan vhost_net vhost vhost_iotlb tap rpcsec_gss_krb5 nfsv4 nfs fscache netfs xt_CHECKSUM xt_conntrack ipt_REJECT nf_reject_ipv4 bridge stp llc>
jul 03 02:20:24 bserver kernel: libcrc32c raid1 raid0 multipath linear hid_generic crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic usbhid ghash_clmulni_intel sha256_ssse3 hid sha1_ssse3 aes>
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008
jul 03 02:20:24 bserver kernel: ---[ end trace 0000000000000000 ]---
jul 03 02:20:24 bserver kernel: pstore: backend (efi_pstore) writing error (-5)
jul 03 02:20:24 bserver kernel: RIP: 0010:_raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel: Code: 31 db e9 00 18 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 8c 12 ef 51 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 1b 31 c0 31 d2 31 c9 31>
jul 03 02:20:24 bserver kernel: RSP: 0018:ffffadf986603c90 EFLAGS: 00010046
jul 03 02:20:24 bserver kernel: RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
jul 03 02:20:24 bserver kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000008
jul 03 02:20:24 bserver kernel: RBP: ffffadf986603cc8 R08: 0000000000000000 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000092 R14: ffff9730f101fb50 R15: 0226800000000000
jul 03 02:20:24 bserver kernel: FS: 00007e293e071640(0000) GS:ffff9730f1000000(0000) knlGS:0000000000000000
jul 03 02:20:24 bserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008 CR3: 000000013e782000 CR4: 00000000003506f0
jul 03 02:20:24 bserver kernel: note: CPU 0/KVM[2559] exited with irqs disabled
jul 03 02:20:24 bserver kernel: note: CPU 0/KVM[2559] exited with preempt_count 2 Pool is working as expected: # zpool status apool
pool: apool
state: ONLINE
scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Jul 6 23:36:38 2024
config:
NAME STATE READ WRITE CKSUM
apool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
errors: No known data errors Version info:
|
System information
Describe the problem you're observing
After a few days (sometimes longer) there is strange behavior that is noticeable in the apps that are served up to the network which are backed by ZFS datasets. Looking at
dmesg
shows the crash. This never causes the OS to completely crash and after this happens ZFS seems to work, but certain workloads against my datasets seem to hang.Describe how to reproduce the problem
Not quite sure how the scenario is triggered. I see that after running my server for a few days to maybe 1.5 weeks this can happen.
Include any warning/errors/backtraces from the system logs
As you can see I'm inside a VM here. Some details about my setup:
25f9:00:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
Guest CPU details:
Guest memory ranges:
The text was updated successfully, but these errors were encountered: