-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZIL IO hung due to deadlock on spa config lock #15078
Labels
Type: Defect
Incorrect behavior (e.g. crash, hang)
Comments
amotin
added a commit
to amotin/zfs
that referenced
this issue
Jul 18, 2023
When we have some LWBs closed and their ZIOs ready to be issued, we can not afford sleeping on config lock if somebody else try to lock it as writer, or it will cause a deadlock. To solve it, move spa_config_enter() from zil_lwb_write_issue() to zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering with other threads. Now if we can't immediately lock config, issue all previously closed LWBs so that they could drop their config locks after completion, and only then allow sleeping on our lock. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Issue openzfs#15078
@prakashsurya Thank you for the deep analysis. It makes sense to me. Please take a look on #15080. I think it should fix the problem with smallest complications. |
amotin
added a commit
to amotin/zfs
that referenced
this issue
Jul 20, 2023
When we have some LWBs closed and their ZIOs ready to be issued, we can not afford sleeping on config lock if somebody else try to lock it as writer, or it will cause a deadlock. To solve it, move spa_config_enter() from zil_lwb_write_issue() to zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering with other threads. Now if we can't immediately lock config, issue all previously closed LWBs so that they could drop their config locks after completion, and only then allow sleeping on our lock. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Issue openzfs#15078
amotin
added a commit
to amotin/zfs
that referenced
this issue
Jul 24, 2023
When we have some LWBs closed and their ZIOs ready to be issued, we can not afford sleeping on config lock if somebody else try to lock it as writer, or it will cause a deadlock. To solve it, move spa_config_enter() from zil_lwb_write_issue() to zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering with other threads. Now if we can't immediately lock config, issue all previously closed LWBs so that they could drop their config locks after completion, and only then allow sleeping on our lock. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Issue openzfs#15078
amotin
added a commit
to amotin/zfs
that referenced
this issue
Jul 25, 2023
When we have some LWBs closed and their ZIOs ready to be issued, we can not afford sleeping on config lock if somebody else try to lock it as writer, or it will cause a deadlock. To solve it, move spa_config_enter() from zil_lwb_write_issue() to zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering with other threads. Now if we can't immediately lock config, issue all previously closed LWBs so that they could drop their config locks after completion, and only then allow sleeping on our lock. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#15078 Closes openzfs#15080
behlendorf
pushed a commit
that referenced
this issue
Jul 25, 2023
When we have some LWBs closed and their ZIOs ready to be issued, we can not afford sleeping on config lock if somebody else try to lock it as writer, or it will cause a deadlock. To solve it, move spa_config_enter() from zil_lwb_write_issue() to zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering with other threads. Now if we can't immediately lock config, issue all previously closed LWBs so that they could drop their config locks after completion, and only then allow sleeping on our lock. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15078 Closes #15080
lundman
pushed a commit
to openzfsonwindows/openzfs
that referenced
this issue
Dec 12, 2023
When we have some LWBs closed and their ZIOs ready to be issued, we can not afford sleeping on config lock if somebody else try to lock it as writer, or it will cause a deadlock. To solve it, move spa_config_enter() from zil_lwb_write_issue() to zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering with other threads. Now if we can't immediately lock config, issue all previously closed LWBs so that they could drop their config locks after completion, and only then allow sleeping on our lock. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: George Wilson <george.wilson@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#15078 Closes openzfs#15080
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I believe we've uncovered a bug resulting from the changes made in f63811f.
We have a system that's stuck with the following stacks:
i.e. this thread attempting to get the lock as writer..
spa_sync
also trying to get this lock..and then we have ZIL threads also trying to get the lock.. e.g..
Once in this state, the system remains in this state indefinitely.. deadlocked.
What we found, is we believe the config lock is held (as reader) by ZIL IOs that have not yet completed.. there's a lot of ZIL IOs pending..
With trees of ZIOs like this:
It's a hard to see from this output, but we verified these ZIOs to be for ZIL blocks.
So, at this point, the question we tried to solve, is why aren't these ZIOs completing? From what we can tell, that's the cause of the deadlock.. i.e. we have ZIL blocks that took the config lock as reader (i.e. acquired in
zil_lwb_write_issue
), got issued, but never dropped that lock because they never completed (i.e. would have dropped inzil_lwb_flush_vdevs_done
).. All other stuck threads, are stuck because these ZIL IOs haven't dropped the lock..I believe what is happening, is, prior to f63811f, we would have issued all LWBs in the order that they get allocated.. due to us calling
zil_lwb_write_issue
while holding thezl_issuer_lock
.. meaning, that from a ZIO parent/child perspective, the child ZIOs would always get issued prior to their parents getting issued..Meaning, by the time the parent ZIO tries to get the config lock in
zil_lwb_write_issue
, the child ZIO for it would have necessarily taken its reader hold on the config lock for itself.. i.e. the child ZIO would always get a reader lock before the parent gets its own reader lock..I think, due to the changes in f63811f, this isn't necessarily true anymore. Specifically, now we drop the
zl_issuer_lock
before we callzil_lwb_write_issue
.. such that, it's possible for the thread to callschedule
in between dropping thezl_issuer_lock
, and actually issuing the LWBs..While that thread is sleeping.. it's also possible for another thread to acquire the
zl_issuer_lock
and callzil_lwb_write_issue
on it's LWBs.. all while the first thread is still sleeping, and hasn't yet issued those original LWBs..Meaning, the second thread's LWBs will be parents of the first thread's LWBs (from the perspective of the LWB root ZIOs).. and these parent LWBs would get issued (and thus, grab the config lock as reader) prior to the child LWBs being issued..
Further, if those child LWBs are then unable to get the config lock as reader (as is our case), then the child ZIOs will never complete, and they'll prevent the parent ZIOs from completing as well.
Since that's a little convoluted, I want to be clear.. Here's what is happening:
zil_commit_writer
, gets thezl_issuer_lock
, builds up itsilwbs
list, drops thezl_issuer_lock
, then goes to sleep before processing and issuing LWBs from theilwbs
list..zil_commit_writer, gets the
zl_issuer_lock, builds up it's own
ilwbslist, drops
zl_issuer_lock, calls
zil_lwb_write_issueon all of its
ilwbs`..zfs_ioc_pool_reopen
is called), but is blocked due to ZIOs from (2) holding the lock as reader, so it waits..zil_lwb_write_issue
and tries to get the config lock as reader, but is blocked due to thread from (3) trying to get the lock as writer, so it waits..At this point, the system is deadlocked.. the ZIOs from (4) are children of the ZIOs from (2).. the ZIOs from (4) are blocked on the thread from (3).. the thread from (3) is blocked by the ZIOs from (2).. and the ZIOs from (2) are blocked waiting on the ZIOs from (4).. so we're deadlocked..
I've verified this to be true.. for example, we can take this tree of ZIOs from earlier:
In this case,
0xffff8912cb471d10
is a child of0xffff8912cb470e88
.. since0xffff8912cb470e88
is in the ZIO pipeline, we know it must hold the config lock as reader (e.g. it's a parent ZIO from (2) in prior example)..0xffff8912cb471d10
is a child ZIO that would have been created in (1) of the prior example, but hasn't been issued yet.. we can verify it's stuck as I described in (4), by getting thelwb_t
from thatzio_t
, and then mapping that pointer to a thread on the system:Here we can see the
lwb
pointer matches the frame:and we can see this thread is stuck in
zil_lwb_write_issue
trying to get the config lock as reader.. stuck because this other thread is trying to get the config lock as writer:FWIW, the unrelated (to the ZIL) thread attempting to get the config lock as a writer is necessary to trigger the deadlock.. which might not be a common case for most users, but actually is common for Delphix.. which might help explain why we've seen this, but perhaps not others.
Further, I think what this boils down to is.. due to how we obtain the config lock as reader for each LWB that we issue, and the parent/child relationship of LWB ZIOs.. I think we need to issue the LWBs in parent/child order.. i.e. always issue the child ZIOs before the parents, as we did prior to f63811f.. so that we cannot get into a situation where a parent ZIO holds the config lock, and a child ZIO can't get it.. or, perhaps, change how we acquire the config lock when issuing the ZIOs..
@amotin does this make sense to you?
The text was updated successfully, but these errors were encountered: