Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC at zio.c:315:zio_data_buf_alloc() #16527

Closed
micsuka opened this issue Sep 10, 2024 · 3 comments
Closed

PANIC at zio.c:315:zio_data_buf_alloc() #16527

micsuka opened this issue Sep 10, 2024 · 3 comments
Labels
Component: Encryption "native encryption" feature Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@micsuka
Copy link

micsuka commented Sep 10, 2024

System information

Type Version/Name
Distribution Name Debian Linux
Distribution Version 11
Kernel Version 5.10.0-28
Architecture amd64
OpenZFS Version 2.0.3-9+deb11u1

Describe the problem you're observing

About the setup:
We run several mariadb databases on zfs on Debian 11, the servers contain the same data through replication.
All servers contain a zpool with mirrored SSDs and the dataset is compressed. I've attached the parameters of the datasets/pools.

I decided to move the database on an encrypted dataset, so
I issued a zfs send ... | zfs receive -o keyformat=raw -o keylocation=prompt... a few days ago on 4 servers.

So, after the datasets have been encrypted, this panic occured on 2 servers of 4, after around 3 days.

Here is the kernel message on server 1:

Sep 09 19:22:56 malta0 kernel: VERIFY3(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed (36028797018963967 < 32768)
Sep 09 19:22:56 malta0 kernel: PANIC at zio.c:315:zio_data_buf_alloc()
Sep 09 19:22:56 malta0 kernel: Showing stack for process 3392903
Sep 09 19:22:56 malta0 kernel: CPU: 23 PID: 3392903 Comm: mariadbd Tainted: P          IOE     5.10.0-28-amd64 #1 Debian 5.10.209-2
Sep 09 19:22:56 malta0 kernel: Hardware name: Thomas-Krenn.AG X10DRi/X10DRi, BIOS 1.0b 09/17/2014
Sep 09 19:22:56 malta0 kernel: Call Trace:
Sep 09 19:22:56 malta0 kernel:  dump_stack+0x6b/0x83
Sep 09 19:22:56 malta0 kernel:  spl_panic+0xd4/0xfc [spl]
Sep 09 19:22:56 malta0 kernel:  ? spl_kmem_cache_alloc+0x74/0x7d0 [spl]
Sep 09 19:22:56 malta0 kernel:  ? kmem_cache_alloc+0xed/0x1f0
Sep 09 19:22:56 malta0 kernel:  ? spl_kmem_cache_alloc+0x97/0x7d0 [spl]
Sep 09 19:22:56 malta0 kernel:  ? aggsum_add+0x175/0x190 [zfs]
Sep 09 19:22:56 malta0 kernel:  ? mutex_lock+0xe/0x30
Sep 09 19:22:56 malta0 kernel:  ? aggsum_add+0x175/0x190 [zfs]
Sep 09 19:22:56 malta0 kernel:  zio_data_buf_alloc+0x55/0x60 [zfs]
Sep 09 19:22:56 malta0 kernel:  abd_alloc_linear+0x8e/0xd0 [zfs]
Sep 09 19:22:56 malta0 kernel:  arc_hdr_alloc_abd+0xe3/0x1f0 [zfs]
Sep 09 19:22:56 malta0 kernel:  arc_hdr_alloc+0x104/0x170 [zfs]
Sep 09 19:22:56 malta0 kernel:  arc_alloc_buf+0x46/0x150 [zfs]
Sep 09 19:22:56 malta0 kernel:  dbuf_hold_copy.constprop.0+0x31/0xa0 [zfs]
Sep 09 19:22:56 malta0 kernel:  dbuf_hold_impl+0x480/0x670 [zfs]
Sep 09 19:22:56 malta0 kernel:  dbuf_hold_level+0x2b/0x60 [zfs]
Sep 09 19:22:56 malta0 kernel:  dmu_tx_check_ioerr+0x35/0xd0 [zfs]
Sep 09 19:22:56 malta0 kernel:  dmu_tx_count_write+0x68/0x1a0 [zfs]
Sep 09 19:22:56 malta0 kernel:  dmu_tx_hold_write_by_dnode+0x35/0x50 [zfs]
Sep 09 19:22:56 malta0 kernel:  zfs_write+0x3f1/0xc80 [zfs]
Sep 09 19:22:56 malta0 kernel:  zpl_iter_write+0x103/0x170 [zfs]
Sep 09 19:22:56 malta0 kernel:  new_sync_write+0x11c/0x1b0
Sep 09 19:22:56 malta0 kernel:  vfs_write+0x1ce/0x260
Sep 09 19:22:56 malta0 kernel:  ksys_write+0x5f/0xe0
Sep 09 19:22:56 malta0 kernel:  do_syscall_64+0x33/0x80
Sep 09 19:22:56 malta0 kernel:  entry_SYSCALL_64_after_hwframe+0x62/0xc7
Sep 09 19:22:56 malta0 kernel: RIP: 0033:0x7fbba8223fef
Sep 09 19:22:56 malta0 kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77>
Sep 09 19:22:56 malta0 kernel: RSP: 002b:00007fbba44d16b0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Sep 09 19:22:56 malta0 kernel: RAX: ffffffffffffffda RBX: 00000000000000a9 RCX: 00007fbba8223fef
Sep 09 19:22:56 malta0 kernel: RDX: 00000000000000a9 RSI: 00005581f7309338 RDI: 0000000000000026
Sep 09 19:22:56 malta0 kernel: RBP: 00007fbba44d1730 R08: 0000000000000000 R09: 0000000000000000
Sep 09 19:22:56 malta0 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000000a9
Sep 09 19:22:56 malta0 kernel: R13: 00005581f7309338 R14: 00005581f7309338 R15: 0000000000000026

and here is the kernel log on server 2:

Sep 10 16:18:48 hetza1 kernel: VERIFY3(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed (36028797018963967 < 32768)
Sep 10 16:18:48 hetza1 kernel: PANIC at zio.c:315:zio_data_buf_alloc()
Sep 10 16:18:48 hetza1 kernel: Showing stack for process 629911
Sep 10 16:18:48 hetza1 kernel: CPU: 13 PID: 629911 Comm: mariadbd Tainted: P           OE     5.10.0-28-amd64 #1 Debian 5.10.209-2
Sep 10 16:18:48 hetza1 kernel: Hardware name: ASUSTeK COMPUTER INC. KRPA-U16 Series/KRPA-U16 Series, BIOS 4102 11/17/2021
Sep 10 16:18:48 hetza1 kernel: Call Trace:
Sep 10 16:18:48 hetza1 kernel:  dump_stack+0x6b/0x83
Sep 10 16:18:48 hetza1 kernel:  spl_panic+0xd4/0xfc [spl]
Sep 10 16:18:48 hetza1 kernel:  ? spl_kmem_cache_alloc+0x74/0x7d0 [spl]
Sep 10 16:18:48 hetza1 kernel:  ? kmem_cache_alloc+0xed/0x1f0
Sep 10 16:18:48 hetza1 kernel:  ? spl_kmem_cache_alloc+0x97/0x7d0 [spl]
Sep 10 16:18:48 hetza1 kernel:  ? aggsum_add+0x175/0x190 [zfs]
Sep 10 16:18:48 hetza1 kernel:  ? mutex_lock+0xe/0x30
Sep 10 16:18:48 hetza1 kernel:  ? aggsum_add+0x175/0x190 [zfs]
Sep 10 16:18:48 hetza1 kernel:  zio_data_buf_alloc+0x55/0x60 [zfs]
Sep 10 16:18:48 hetza1 kernel:  abd_alloc_linear+0x8e/0xd0 [zfs]
Sep 10 16:18:48 hetza1 kernel:  arc_hdr_alloc_abd+0xe3/0x1f0 [zfs]
Sep 10 16:18:48 hetza1 kernel:  arc_hdr_alloc+0x104/0x170 [zfs]
Sep 10 16:18:48 hetza1 kernel:  arc_alloc_buf+0x46/0x150 [zfs]
Sep 10 16:18:48 hetza1 kernel:  dbuf_hold_copy.constprop.0+0x31/0xa0 [zfs]
Sep 10 16:18:48 hetza1 kernel:  dbuf_hold_impl+0x480/0x670 [zfs]
Sep 10 16:18:48 hetza1 kernel:  dbuf_hold_level+0x2b/0x60 [zfs]
Sep 10 16:18:48 hetza1 kernel:  dmu_tx_check_ioerr+0x35/0xd0 [zfs]
Sep 10 16:18:48 hetza1 kernel:  dmu_tx_count_write+0xed/0x1a0 [zfs]
Sep 10 16:18:48 hetza1 kernel:  dmu_tx_hold_write_by_dnode+0x35/0x50 [zfs]
Sep 10 16:18:48 hetza1 kernel:  zfs_write+0x3f1/0xc80 [zfs]
Sep 10 16:18:48 hetza1 kernel:  ? aa_sk_perm+0x3e/0x1b0
Sep 10 16:18:48 hetza1 kernel:  zpl_iter_write+0x103/0x170 [zfs]
Sep 10 16:18:48 hetza1 kernel:  new_sync_write+0x11c/0x1b0
Sep 10 16:18:48 hetza1 kernel:  vfs_write+0x1ce/0x260
Sep 10 16:18:48 hetza1 kernel:  ksys_write+0x5f/0xe0
Sep 10 16:18:48 hetza1 kernel:  do_syscall_64+0x33/0x80
Sep 10 16:18:48 hetza1 kernel:  entry_SYSCALL_64_after_hwframe+0x62/0xc7
Sep 10 16:18:48 hetza1 kernel: RIP: 0033:0x7f0c8c13bfef
Sep 10 16:18:48 hetza1 kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77>
Sep 10 16:18:48 hetza1 kernel: RSP: 002b:00007f0c883f4c50 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Sep 10 16:18:48 hetza1 kernel: RAX: ffffffffffffffda RBX: 000000000000002a RCX: 00007f0c8c13bfef
Sep 10 16:18:48 hetza1 kernel: RDX: 000000000000002a RSI: 00007f039401d978 RDI: 000000000000028c
Sep 10 16:18:48 hetza1 kernel: RBP: 00007f0c883f4cd0 R08: 0000000000000000 R09: 0000000000000234
Sep 10 16:18:48 hetza1 kernel: R10: 000000000000002a R11: 0000000000000293 R12: 0000000000000001
Sep 10 16:18:48 hetza1 kernel: R13: 000000000000002a R14: 00007f039401d978 R15: 000000000000028c

zfs1info.txt
zfs2info.txt

I exclude the hardware problem, there was no trace of any error in the logs and like I said: these systems were rock solid for years.
The servers contain ECC RAMs, the CPUs support aes.

Describe how to reproduce the problem

I'm confident that this problem is related to the zfs encryption.

Include any warning/errors/backtraces from the system logs

@micsuka micsuka added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 10, 2024
@rincebrain
Copy link
Contributor

rincebrain commented Sep 10, 2024

I recommend running a version released after 2021 and seeing if your problem is resolved.

(Specifically, 4036b8d might be useful, but there's a lot of bugs in native encryption, some of which have been fixed in the intervening 3.5 years since 2.0.3 was released. If you don't want to upgrade, you should probably file bugs against Debian, not upstream.)

@rincebrain rincebrain added the Component: Encryption "native encryption" feature label Sep 10, 2024
@micsuka
Copy link
Author

micsuka commented Sep 11, 2024

Thank you, I've updated the zfs to 2.1.11-1~bpo11+1 on one server for now and I set the encryption back.
It handles the same load now, let's see how it's behaving in the next few weeks.

@micsuka
Copy link
Author

micsuka commented Sep 30, 2024

so, I have zfs-2.1.11-1~bpo11+1 on all of our servers now... and it seems to be stable.

@micsuka micsuka closed this as completed Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Encryption "native encryption" feature Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants