-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ratelimit deadman zevents as with delay zevents #11786
Conversation
Good find. It just looks like the test case still needs to be refined a little. |
f2741f8
to
42e3c5f
Compare
|
42e3c5f
to
eeced36
Compare
Starting to wonder how much work |
eeced36
to
9007277
Compare
Increasing the timeout seemed to help the first time, but one or two still failed. This really shouldn't take that long, but we seem to get unlucky on AWS from time to time. I hope 60 seconds is plenty of time in practice. |
Even with the 60s limit we seem to have hit it with the Github Actions CI builders. That's really quite a long time, and taking a quick look at the console logs it does look like the deadman ran. So it seems there should have been events. But it's suspicious that both the Github Actions CI builders failed. |
log_fail "No dropped events after $TIMEOUT seconds" | ||
fi | ||
sleep 1 | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right to me. Why are we expecting dropped events here? I'd expect to only see drops reported when we overflow the zfs_zevent_len_max
queue depth and the ZED doesn't consume the event before it's discard. Events which are internally ratelimited on the kernel side and never posted aren't considered to be drops.
What we've been meaning to do for a while is increase the default zfs_zevent_len_max
to make drops far more uncommon. The current value is really very low, and scaled of the number of cpus which doesn't make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I definitely misunderstood that. I'll look for another way to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact zfs_is_ratelimiting_event()
does increment ratelimit_dropped
via fm_erpt_dropped_increment()
when the rate limiting kicks in, so I should expect to see dropped events. ZED is not running during these tests, either. I wonder if it's possible the zio is actually completing in under 1ms, so no deadman events are posted? That feels too good to be true.
Looks like something slipped in to break the build on stable/12 |
log_must zinject -a | ||
log_must dd if=$mntpnt/file of=$TEST_BASE_DIR/devnull oflag=sync | ||
|
||
sleep 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the sleep here needed with such a short deadman timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's going on here. I put the sleep there under the understanding that events might not have finished making their way into the queue by the time we go to look for them. Ten seconds should have been overkill, but apparently it's not even enough in some cases. There are some test bots failing to see any deadman events somehow.
Unfortunately, I can't reproduce these failures locally. The tests are working as intended on FreeBSD and Linux in my dev environments. I'll continue auditing the zevent code. Perhaps there are some more hidden issues like the ratelimit_dropped one I came across already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest updating the test to use the file_wait_event
function from events_common.ksh
. That checks the ZED log for the message and you can set a maximum timeout... although I guess that wouldn't be portable for FreeBSD. But you could write something similar.
One other thing you might try is to replace the zinject -a
with a zpool export; zpool import
. While zinject -a
should drop the cache, it may not drop everything if for example the buffers are still being written back. In which case they'll be cache hits for your dd read and you won't trigger the deadman probably.
|
log_must file_write -b 1048576 -c 8 -o create -d 0 -f $mntpnt/file | ||
log_must zpool export $TESTPOOL | ||
log_must zpool import $TESTPOOL | ||
log_must dd if=$mntpnt/file of=$TEST_BASE_DIR/devnull oflag=sync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in the most recent results we didn't see any delays. Perhaps we should make sure the deadman will trigger by additionally injecting some IO delays in to the pipeline. Check out the apply_zinject_delays
function in perf.shlib
for how to use zinject
for this.
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
|
🎉 |
Alright! Let me just resubmit those builds for good measure then this should be good to go. |
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes openzfs#11786
Just as delay zevents can flood the zevent pipe when a vdev becomes unresponsive, so do the deadman zevents. Ratelimit deadman zevents according to the same tunable as for delay zevents. Enable deadman tests on FreeBSD and add a test for deadman event ratelimiting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #11786
Motivation and Context
Just as delay zevents can flood the zevent pipe when a vdev becomes
unresponsive, so do the deadman zevents.
Description
Ratelimit deadman zevents according to the same tunable as for delay
zevents.
Make existing deadman tests run on FreeBSD.
Add a test for the deadman event ratelimit.
How Has This Been Tested?
Ran ZTS on FreeBSD and Linux.
Types of changes
Checklist:
Signed-off-by
.