-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper async reaping (or alarm(2)-based delooping at least) for ZED #11807
Proper async reaping (or alarm(2)-based delooping at least) for ZED #11807
Conversation
577b927
to
4525fb0
Compare
The async reaping itself looks great. However, I'm concerned that after this change there wouldn't be any form of ratelimitting if a large number of events are generated. I think we need to add a limit to the maximum number of threads created at once. |
There's exactly one reaper thread (but it gets started lazily on first entry to |
Sure, that all makes good sense. But if I'm not mistaken all of the zedlets are now run asynchronously with this change. So if say the kernel posts 1000 events then the ZED will fork and exec 1000 threads to handle them, whereas before they'd be done serially. Clearly doing them serially was never ideal, but neither is potentially kicking off an unbounded number of threads. The kernel has its own rate limiting on events to prevent this to some degree. However, there could be a backlog of events which haven't been processed yet, so it seems like a reasonable precaution to limit the number of concurrent zedlets to say 16 or 32. |
Oh! right, you meant processes, not threads. That makes sense, yeah. I'll add a child limit. |
4525fb0
to
3abccd6
Compare
Updated: 16 max jobs by default, tunable with |
3b1ba79
to
108f9c7
Compare
Uhh, the FreeBSD build ran out of disk space? |
Ugh, yeah. Feel free to ignore that, it's actually for FreeBSD 14 which I'll disable until we sort that out. However, this |
2499c78
to
f64422b
Compare
Hm; I've added
and then timing out, which makes no sense to me? How can it be killed (and |
That's may be possible. For the the CI we build the packages then, install them, and then kick off the testing. But I believe on Debian in particular daemons are started by default on installation. This different than on some others distributions, so that may be what's happening here. I can take a look if that's happening in our CI environment. edit: It looks like the same thing may be happening on Ubuntu. |
Can you try including this change in the PR to update the workflow to stop the ZED daemon which may be started after the package installation. This won't resolve the Debian issue but I've opened a PR against the buildbot for that we can merge if this does resolve the issue for the Ubuntu GitHub actions builders. diff --git a/.github/workflows/zfs-tests-functional.yml b/.github/workflows/zfs>
index 631f174b7..79aa8a25b 100644
--- a/.github/workflows/zfs-tests-functional.yml
+++ b/.github/workflows/zfs-tests-functional.yml
@@ -44,6 +44,8 @@ jobs:
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
+ # Stop the ZED daemon so it won't conflict with the ZTS event tests
+ sudo systemctl stop zfs-zed
- name: Tests
run: |
/usr/share/zfs/zfs-tests.sh -v -s 3G
diff --git a/.github/workflows/zfs-tests-sanity.yml b/.github/workflows/zfs-tes>
index e03399757..09f317142 100644
--- a/.github/workflows/zfs-tests-sanity.yml
+++ b/.github/workflows/zfs-tests-sanity.yml
@@ -40,6 +40,8 @@ jobs:
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
+ # Stop the ZED daemon so it won't conflict with the ZTS event tests
+ sudo systemctl stop zfs-zed
- name: Tests
run: |
/usr/share/zfs/zfs-tests.sh -v -s 3G -r sanity |
I've added your patch and it still fails (not sure why, I personally thought it would, but we may never know). I've added a variation to actually just murder all ZEDs instead of flakily detecting the pidfile and log the PIDs it's killing. |
Let's see how it goes. That's probably fine for the tests. |
0212c14
to
c7e9c97
Compare
FreeBSD 12 failed with filesystem full somewhere, CentOS Stream 8 dropped off, and GitHub Actions Ubuntu 20 ENOSPCed. But on the buildds that didn't fail for unrelated reasons, it seems to have worked! As a reference, consider this log and jump to "Test (Linux): /usr/share/zfs/zfs-tests/tests/functional/events/zed_rc_filter (run as root) [00:14] [PASS]". |
@nabijaczleweli OK. Tests are passing now and the change itself makes sense to me. I resubmitted the failed builds except for the ENOSPC Ubuntu 20 which has a separate fix to the GitHub workflow was merged for today. That leaves us with how best to squash these commits. I'd suggesting something like this, and just squash away the alarm changes. zed: set names for all threads |
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
The FIXME comment was there since the initial implementation in 2014, there are no users Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
c7e9c97
to
932077b
Compare
Agree, that does seem best. Squashed onto current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I resubmitted the failed builds (which failed for unrelated reasons) and when they come back clean this should be ready to merge. Thanks for making this improvement, it feels like its been on the TODO list forever.
Since the initial implementation in January of 2014, it seems? My pleasure. |
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11807
The FIXME comment was there since the initial implementation in 2014, there are no users Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11807
200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #11807
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
The FIXME comment was there since the initial implementation in 2014, there are no users Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
The FIXME comment was there since the initial implementation in 2014, there are no users Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
The FIXME comment was there since the initial implementation in 2014, there are no users Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
200ms time-out is relatively long, but if we already hit the cap, then we'll likely be able to spawn multiple new jobs when we wake up Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11807
Motivation and Context
Confer #11769, also zed(8) and fsck.zfs(8) are the only manpages with BUGS sexions.
Description
The first patch replaces the weird waitpid(2)/nanosleep(2) loop with alarm(2), allowing zedlets to cancel or ignore the signal if they can usefully live longer than 10 seconds.
The third patch launches zedlets without blocking and starts a separate reaping thread to wait on them asynchronously.
The fourth patch effectively undoes the first one – we don't need a time-out anymore since long-running zedlets aren't going to hold up execution!
These can be squashed in 4 sensible ways and I'm not in a state to figure out which one is best, so I'll leave that decision to a maintainer.
The second patch names all the zed threads – upstream OpenZFS only supports FreeBSD and Linux, which both have pthread_setname_np(3); NetBSD has an additional
void *
for formatting; OpenBSD has pthread_set_name_np(3) with the same signature as FreeBSD; the only problematic(ish) platform is Darwin, whose pthread_setname_np(3) takes only the name and sets the name for the current thread. These are all trivial patches for distributions to make.How Has This Been Tested?
I ran it a bunch in all the failure modes I could think of.
Types of changes
Checklist:
Signed-off-by
.