Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC changes to fix memory related crashes with encryption #15538

Closed

Conversation

don-brady
Copy link
Contributor

Motivation and Context

Odoo has reported various crashes in ZFS when using native encryption on the ZFS 2.1.x branches. In many occasions, it was a null pointer dereference, as reported in #12775

Description

The following changes have prevented reoccurrence of the issues.

  1. Deal with a null pabd in arc_buf_fill(): Bail early in arc_buf_fill() when hdr->b_crypt_hdr.b_rabd is NULL.
  2. In arc_write(), avoid arc_hdr_free_abd() when HDR_IO_IN_PROGRESS indicates it's still in use.
  3. Added additional zfs debug message logging for unexpected edge cases to assist in future diagnosis.

Sponsored-By: Odoo SA
Sponsored-By: Klara Inc.

How Has This Been Tested?

Tested with ztest runs with KSAN enabled and encryption forced for every dataset
Tested with ZTS functional/cli_root and functional/rsend, both will exercise zfs encryption paths
Original patches tested on a system that was exhibiting the memory related errors to confirm it addressed the issue.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

module/zfs/arc.c Outdated
HDR_GET_PSIZE(hdr), HDR_GET_LSIZE(hdr),
(u_longlong_t)zb->zb_objset,
(u_longlong_t)zb->zb_object);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return error here? Otherwise its going to immediately panic when abd_copy_to_buf derefs b_pabd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh never mind, maybe your purpose is to just report more when it does crash?

module/zfs/sa.c Outdated
@@ -360,7 +360,7 @@ sa_attr_op(sa_handle_t *hdl, sa_bulk_attr_t *bulk, int count,
}
}
if (error && error != ENOENT) {
return ((error == ECKSUM) ? EIO : error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these two but are they related or snuck into the wrong commit?

@rincebrain
Copy link
Contributor

I'll give this a shot on my exotic system which somehow reliably crashes in this way and report back.

@rincebrain
Copy link
Contributor

rincebrain commented Nov 17, 2023

I am sad to report

[  628.516772] VERIFY(HDR_EMPTY_OR_LOCKED(hdr)) failed
[  628.581024] PANIC at arc.c:1962:arc_buf_untransform_in_place()
[  628.657793] Showing stack for process 2928
[  628.657817] CPU: 0 PID: 2928 Comm: zfs Tainted: P           OE     5.10.0-8-sparc64 #1 Debian 5.10.46-4
[  628.657828] Call Trace:
[  628.657912] [<00000000107b159c>] spl_dumpstack+0x1c/0x2c [spl]
[  628.657945] [<00000000107b1630>] spl_panic+0x84/0xa4 [spl]
[  628.659430] [<00000000108e7398>] arc_buf_fill+0xc78/0x1c80 [zfs]
[  628.660088] [<00000000108e83c4>] arc_untransform+0x24/0xa0 [zfs]
[  628.660779] [<0000000010906838>] dbuf_read_verify_dnode_crypt+0xf8/0x1a0 [zfs]
[  628.661434] [<000000001090e2e0>] dbuf_read_impl.constprop.0+0x2a0/0xcc0 [zfs]
[  628.662089] [<000000001090ef74>] dbuf_read+0x274/0x640 [zfs]
[  628.662742] [<000000001091f570>] dmu_buf_hold+0x50/0x80 [zfs]
[  628.663385] [<0000000010a37cc0>] zap_lockdir+0x20/0x100 [zfs]
[  628.664028] [<0000000010a393d0>] zap_lookup+0x30/0x80 [zfs]
[  628.664712] [<00000000109b159c>] sa_setup+0x17c/0x660 [zfs]
[  628.665350] [<0000000010a97e54>] zfsvfs_init.part.0+0x374/0x560 [zfs]
[  628.665987] [<0000000010a9ab0c>] zfsvfs_create_impl+0x1ac/0x360 [zfs]
[  628.666625] [<0000000010a9ad38>] zfsvfs_create+0x78/0xa0 [zfs]
[  628.667262] [<0000000010a9ade8>] zfs_domount+0x88/0x700 [zfs]
[  628.667896] [<0000000010ab659c>] zpl_mount+0x1fc/0x320 [zfs]

If you're not expecting this to have fixed that, ignore me, but I was assuming the changes in arc_write would have been expected to give this a miss.

e: Sorry, disregard, apparently I had loaded the wrong clone's modules. Trying again...

e2: It didn't panic so far, but it did fail a test run of zfs_receive_raw with:
02:44:57.41 1700207096 fffff80030c24da0 arc.c:2074:arc_buf_fill(): in-place dnode but pabd is NULL! compress=2 encrypted=0 psize=16384 lsize=16384 objset=127 object=0

edit 3: It seems like, so far, about a 50% rate of the tests failing with dbgmsg logs like the above, but no panics yet, so that's nice.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, but how is this a valid fix for anything? You are merely burying the issues, whatever they are, deeper, unless you just collecting data for later debugging. I don't think we should pollute the code this way.

module/zfs/arc.c Outdated
if (HDR_PROTECTED(hdr) == B_FALSE) {
zfs_dbgmsg("allocating rabd on a not-encrypted HDR");
}
ASSERT(HDR_PROTECTED(hdr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already asserted with the IMPLY() above.

@amotin
Copy link
Member

amotin commented Nov 17, 2023

@rincebrain If your test case is so reproducible, could somebody look on it closer to find what is going on? Reliably faulting assertion is a very good starting point for investigation.

@rincebrain
Copy link
Contributor

It's extremely reproducible on this one system, which is a very strange little sparc single core single thread box.

I've been begging people to look into it for years at this point.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 17, 2023
@don-brady
Copy link
Contributor Author

It's extremely reproducible on this one system, which is a very strange little sparc single core single thread box.

@rincebrain what is the workload/test?s I'd like to attempt to reproduce (I don't have a sparc but could try a single core VM).

@rincebrain
Copy link
Contributor

rincebrain commented Dec 5, 2023 via email

@don-brady
Copy link
Contributor Author

but zfs_receive_raw is what I usually run in a loop to make it go bang.

Confirming that you are referring to functional/cli_root/zfs_receive/zfs_receive_raw ?

@rincebrain
Copy link
Contributor

rincebrain commented Dec 20, 2023

I am, yes. The other ones in zfs_receive that use --raw will also work, I think, but that's the one I use.

For whatever strange reason, the little sparc I have, that panics over 50% of the time on baseline, and while it doesn't panic with this PR, it does log the "this shouldn't happen" dbgmsg and fail the test.

@don-brady
Copy link
Contributor Author

For whatever strange reason, the little sparc I have, that panics over 50% of the time on baseline

Thanks for confirming. Could you provide details on your sparc box? (processor gen, RAM, CPUs, 64bit?)

@rincebrain
Copy link
Contributor

$ cat /proc/cpuinfo
cpu             : TI UltraSparc IIi (Sabre)
fpu             : UltraSparc IIi integrated FPU
pmu             : ultra12
prom            : OBP 3.10.25 2000/01/17 21:26
type            : sun4u
ncpus probed    : 1
ncpus active    : 1
D$ parity tl1   : 0
I$ parity tl1   : 0
Cpu0ClkTck      : 000000001a3a3d94
cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
MMU Type        : Spitfire
MMU PGSZs       : 8K,64K,512K,4MB

It's a little Netra T1 105, running Debian sid sparc64.

Kernel version doesn't seem to matter, I've been complaining about this since Linux 3.x, I think, and it's now running 5.10.

Doesn't seem to be all sparcs that break quite so reliably, I have a much beefier sparc64 (a T2) that doesn't reproduce this nearly as reliably, but does have the nice property of compiling things for the little 1c1t much faster than it itself does. :)

I've got a whole disk image of it from years ago that works but doesn't seem to reliably repro this, but I don't know if that's qemu being very unreliable at sparc emulation or what.

- Bail early in arc_buf_fill() when hdr->b_crypt_hdr.b_rabd is NULL
- In arc_write(), avoid arc_hdr_free_abd() when HDR_IO_IN_PROGRESS
  indicates it's still in use.

Sponsored-By: Odoo SA
Sponsored-By: Klara Inc.
Signed-off-by: Don Brady <don.brady@klarasystems.com>
@don-brady don-brady force-pushed the odoo_patch_upstream_master branch from 3090be2 to 38f7449 Compare January 17, 2024 21:26
@don-brady
Copy link
Contributor Author

@rincebrain -- I rebased to latest master branch and I removed all the unnecessary debug messages.
Can you test on your Netra box again (with and without) the changes? Thanks

@rincebrain
Copy link
Contributor

rincebrain commented Jan 17, 2024

Current git apparently just doesn't compile on my system after 60e389c.

Since I've built git since that commit without a problem, I'm not really sure why it's only upset now, but that pragma didn't exist in gcc 10, it appears.

module/lua/ldo.c:178:32: error: unknown option after ‘#pragma GCC diagnostic’ kind [-Werror=pragmas]
  178 | #pragma GCC diagnostic ignored "-Winfinite-recursion"

I'm going to guess the check in 3c1e193 is just broken and assumes CC=KERNEL_CC incorrectly, and I just broke it recently by installing an additional userland compiler that's new enough to support that.

I've hacked around it with a hardcoded __GNUC__ >= 12 check, but we should probably fix that...

e: that test should probably be trying to build a dummy kernel module with that flag, not userland, to avoid trying to manually extract the kernel compiler if it's not explicitly given...

@rincebrain
Copy link
Contributor

rincebrain commented Jan 18, 2024

Unpatched (except to work around the above) git:

[2124999.497020] ZFS: Loaded module v2.2.99-310_ga0b2a93c4 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[2125099.258902] VERIFY3(hdr->b_l1hdr.b_pabd != NULL) failed (0000000000000000 != 0000000000000000)
[2125099.374608] PANIC at arc.c:1963:arc_buf_untransform_in_place()
[2125099.453699] Showing stack for process 2704804
[2125099.453724] CPU: 0 PID: 2704804 Comm: zfs Tainted: P           OE     5.10.0-8-sparc64 #1 Debian 5.10.46-4
[2125099.453734] Call Trace:
[2125099.453830] [<000000001033163c>] spl_dumpstack+0x1c/0x2c [spl]
[2125099.453866] [<00000000103316d0>] spl_panic+0x84/0xa4 [spl]
[2125099.455355] [<00000000109dfe50>] arc_buf_fill+0x1790/0x1c80 [zfs]
[2125099.456004] [<00000000109e0364>] arc_untransform+0x24/0xa0 [zfs]
[2125099.456650] [<00000000109fe818>] dbuf_read_verify_dnode_crypt+0xf8/0x1a0 [zfs]
[2125099.457295] [<0000000010a06180>] dbuf_read_impl.constprop.0+0x2a0/0xca0 [zfs]
[2125099.457979] [<0000000010a06df4>] dbuf_read+0x274/0x640 [zfs]
[2125099.458623] [<0000000010a17af0>] dmu_buf_hold+0x50/0x80 [zfs]
[2125099.459256] [<0000000010b31100>] zap_lockdir+0x20/0x100 [zfs]
[2125099.459889] [<0000000010b32810>] zap_lookup+0x30/0x80 [zfs]
[2125099.460528] [<0000000010aa9dfc>] sa_setup+0x17c/0x660 [zfs]
[2125099.461156] [<0000000010b92054>] zfsvfs_init.part.0+0x374/0x560 [zfs]
[2125099.461817] [<0000000010b94d0c>] zfsvfs_create_impl+0x1ac/0x360 [zfs]
[2125099.462444] [<0000000010b94f38>] zfsvfs_create+0x78/0xa0 [zfs]
[2125099.463070] [<0000000010b94fe8>] zfs_domount+0x88/0x700 [zfs]
[2125099.463695] [<0000000010bb089c>] zpl_mount+0x1fc/0x320 [zfs]

With it, sometimes the test zfs_receive_raw fails with:

00:53:19.70 SUCCESS: eval zfs send -w testpool/testfs1@snap | zfs receive testpool/testfs1/c1
00:53:31.16 cannot mount 'testpool/testfs1/c1': Permission denied
00:53:31.17 ERROR: eval echo password | zfs mount -l testpool/testfs1/c1 exited 1

and nothing in dbgmsg now with the debug prints removed.

@amotin
Copy link
Member

amotin commented Apr 17, 2024

After spending a week in this area I can say that arc_untransform() is one big mess. ARC is simply unable to properly lock it, since its hash locks do not protect anonymous buffers. Protection of those should be handled on higher levels. The best idea I've got so far is #16104 . I hope it to be enough for now. In general ZIO and ARC layers seems to be perfectly fine with mixing plain-text/compressed/encrypted buffers, while DBUF layer was just not designed for that.

@behlendorf
Copy link
Contributor

Replaced by #16104.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants