-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid panic in case of pool errors and missing L2ARC #12392
Conversation
|
Does it make sense to write a ZTS test case since you already provide a reproducer? |
@gamanakis so it looks to me like we could perhaps handle this issue instead by discarding the buffers identify a little bit sooner in diff --git a/module/zfs/arc.c b/module/zfs/arc.c
index c2557f33c..8c2c1e416 100644
--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -3781,8 +3781,13 @@ arc_hdr_destroy(arc_buf_hdr_t *hdr)
* to acquire the l2ad_mtx. If that happens, we don't
* want to re-destroy the header's L2 portion.
*/
- if (HDR_HAS_L2HDR(hdr))
+ if (HDR_HAS_L2HDR(hdr)) {
+
+ if (!HDR_EMPTY(hdr))
+ buf_discard_identity(hdr);
+
arc_hdr_l2hdr_destroy(hdr);
+ }
if (!buflist_held)
mutex_exit(&dev->l2ad_mtx); |
@behlendorf thank you for taking a look! Interesting approach, I will rebase to master and re-test with your proposal. |
In case an ARC buffer is allocated only on L2ARC, and there are underlying errors in a pool with the cache device in faulty state, a panic can occur in arc_read_done()->arc_hdr_destroy()-> arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free the ARC buffer. Fix this by discarding the buffer's identity in arc_hdr_destroy(), in case the buffer is not empty, before calling arc_hdr_l2hdr_destroy(). Signed-off-by: George Amanakis <gamanakis@gmail.com>
aac389f
to
67c2f8f
Compare
67c2f8f: Rebased to master, follow approach suggested by @behlendorf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. It definitely makes no sense to take the hash_lock as was proposed originally, since we know the header is not there.
In case an ARC buffer is allocated only on L2ARC, and there are underlying errors in a pool with the cache device in faulty state, a panic can occur in arc_read_done()->arc_hdr_destroy()-> arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free the ARC buffer. Fix this by discarding the buffer's identity in arc_hdr_destroy(), in case the buffer is not empty, before calling arc_hdr_l2hdr_destroy(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes openzfs#12392
Motivation and Context
Local testing hitting the panic:
Description
In case an ARC buffer is allocated only on L2ARC, and there are
underlying errors in a pool with the cache device in faulty state, a
panic can occur in arc_read_done()->arc_hdr_destroy()->
arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free
the ARC buffer.
Fix this by checking in arc_read_done() if the ARC buffer to be freed
is stored on L2ARC and not empty, and acquiring its hash_lock in this
case.
How Has This Been Tested?
Without this patch the following code panics:
Types of changes
Checklist:
Signed-off-by
.