-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZTS test slog_015_neg.ksh can trigger zfs deadlock #14775
Comments
@ryao Please take a look at this to determine if this is indeed the result of your changes from PR# 14514 |
I've seen similar hang in the following two tests too: cli_root/zfs_copies/zfs_copies_003_pos.ksh |
I've opened #14790 which reverts the suspected change to try and verify its responsible. I've observed the deadlock described above multiple times in the CI although never ran it down to the particular test(s) or commit. Assuming this is the cause we can revert the change for now and then work on following up with an alternate fix for the original issue. |
That appears to be the case. |
This reverts commit 4c856fb. To quote a pending upstream PR: This reverts commit 4c856fb to resolve a newly introduced deadlock which in practice is more disruptive that the issue this commit intended to address. Causes deadlocks described in openzfs/zfs#14775 Sponsored by: Rubicon Communications, LLC ("Netgate")
This reverts commit 4c856fb to resolve a newly introduced deadlock which in practice in more disruptive that the issue this commit intended to address. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #14775 Closes #14790
This reverts commit 4c856fb to resolve a newly introduced deadlock which in practice in more disruptive that the issue this commit intended to address. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#14775 Closes openzfs#14790
This reverts commit 4c856fb. To quote a pending upstream PR: This reverts commit 4c856fb to resolve a newly introduced deadlock which in practice is more disruptive that the issue this commit intended to address. Causes deadlocks described in openzfs/zfs#14775 Sponsored by: Rubicon Communications, LLC ("Netgate")
Just want to comment that this deadlock is still there. |
PR #15103 was merged today to hopefully address this. |
System information
Describe the problem you're observing
The ZTS test: slog_015_neg.ksh can trigger a deadlock in the zfs module. This manifests as a test hang while doing a
zpool offline
request. This appears to be a result of changes introduced in PR# 14514, which introduced thezl_syspend_lock
.Describe how to reproduce the problem
This reproduces very reliably on our config when running slog_015_neg.ksh
Include any warning/errors/backtraces from the system logs
Here is an example of the three deadlocking threads:
Thread 1 is a writer to the pool. It is holding the zl_suspend_lock as READER (from
zil_commit()
) and waiting for the txg to sync:Thread 2 is the zpool command trying to offline a log device. It is holding the dp_config_lock as READER and trying to get the zl_suspend_lock as WRITER:
Thread 3 is the pool sync thread. It is waiting for the dp_config_lock lock as WRITER:
The text was updated successfully, but these errors were encountered: