Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper async reaping (or alarm(2)-based delooping at least) for ZED #11807

Closed

Conversation

nabijaczleweli
Copy link
Contributor

Motivation and Context

Confer #11769, also zed(8) and fsck.zfs(8) are the only manpages with BUGS sexions.

Description

The first patch replaces the weird waitpid(2)/nanosleep(2) loop with alarm(2), allowing zedlets to cancel or ignore the signal if they can usefully live longer than 10 seconds.

The third patch launches zedlets without blocking and starts a separate reaping thread to wait on them asynchronously.

The fourth patch effectively undoes the first one – we don't need a time-out anymore since long-running zedlets aren't going to hold up execution!

These can be squashed in 4 sensible ways and I'm not in a state to figure out which one is best, so I'll leave that decision to a maintainer.

The second patch names all the zed threads – upstream OpenZFS only supports FreeBSD and Linux, which both have pthread_setname_np(3); NetBSD has an additional void * for formatting; OpenBSD has pthread_set_name_np(3) with the same signature as FreeBSD; the only problematic(ish) platform is Darwin, whose pthread_setname_np(3) takes only the name and sets the name for the current thread. These are all trivial patches for distributions to make.

How Has This Been Tested?

I ran it a bunch in all the failure modes I could think of.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@nabijaczleweli nabijaczleweli changed the title Proper async reaping (or alarm(2)-based delooping at least) for zed Proper async reaping (or alarm(2)-based delooping at least) for ZED Mar 27, 2021
@nabijaczleweli nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch from 577b927 to 4525fb0 Compare March 27, 2021 14:16
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 28, 2021
@behlendorf
Copy link
Contributor

The async reaping itself looks great. However, I'm concerned that after this change there wouldn't be any form of ratelimitting if a large number of events are generated. I think we need to add a limit to the maximum number of threads created at once.

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Mar 28, 2021

There's exactly one reaper thread (but it gets started lazily on first entry to zed_exec_process(), which is why that may've been confusing, I guess).

@behlendorf
Copy link
Contributor

Sure, that all makes good sense. But if I'm not mistaken all of the zedlets are now run asynchronously with this change. So if say the kernel posts 1000 events then the ZED will fork and exec 1000 threads to handle them, whereas before they'd be done serially. Clearly doing them serially was never ideal, but neither is potentially kicking off an unbounded number of threads. The kernel has its own rate limiting on events to prevent this to some degree. However, there could be a backlog of events which haven't been processed yet, so it seems like a reasonable precaution to limit the number of concurrent zedlets to say 16 or 32.

@nabijaczleweli
Copy link
Contributor Author

Oh! right, you meant processes, not threads. That makes sense, yeah. I'll add a child limit.

@nabijaczleweli nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch from 4525fb0 to 3abccd6 Compare March 29, 2021 13:24
@nabijaczleweli
Copy link
Contributor Author

Updated: 16 max jobs by default, tunable with -j argument

@nabijaczleweli nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch 2 times, most recently from 3b1ba79 to 108f9c7 Compare March 29, 2021 13:48
@nabijaczleweli
Copy link
Contributor Author

Uhh, the FreeBSD build ran out of disk space?

@behlendorf
Copy link
Contributor

Uhh, the FreeBSD build ran out of disk space?

Ugh, yeah. Feel free to ignore that, it's actually for FreeBSD 14 which I'll disable until we sort that out.

However, this events/zed_rc_filter ZTS failure you'll need to sort out. There are several ZED tests under events/ and they expect the zedlets to run in a specific order. That's really not the case any more with your change and the default value. The easy fix here would be to add -j1 to the zed_start ksh function in `tests/zfs-tests/include/libtest.shlib".

http://build.zfsonlinux.org/builders/Fedora%2033%20x86_64%20%28TEST%29/builds/1619/steps/shell_4/logs/summary

@nabijaczleweli nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch 3 times, most recently from 2499c78 to f64422b Compare March 30, 2021 14:56
@nabijaczleweli
Copy link
Contributor Author

Hm; I've added -j 1, as you suggested, then an explicit zed_stop before it in the test itself since some test runs failed on that. The first round all worked (well, of the ones didn't ENOSPC), but now Debian 10 appears to be failing with

16:41:31.32 NOTE: Stopping ZED
16:41:31.33 SUCCESS: zed_stop
16:41:31.34 NOTE: ZED already running
16:41:31.34 SUCCESS: zed_start

and then timing out, which makes no sense to me? How can it be killed (and zed_stop does wait for it to actually exit) and then be running again, not a second later (since pgrep -x zed returned 0), and hence not started? The Fedora 33 build worked. Dunno, don't get it. A short sleep after zed_stop and before zed_start maybe? Or is there something starting the daemon with different arguments there?

@behlendorf
Copy link
Contributor

behlendorf commented Mar 30, 2021

Or is there something starting the daemon with different arguments there?

That's may be possible. For the the CI we build the packages then, install them, and then kick off the testing. But I believe on Debian in particular daemons are started by default on installation. This different than on some others distributions, so that may be what's happening here. I can take a look if that's happening in our CI environment.

edit: It looks like the same thing may be happening on Ubuntu.

@behlendorf
Copy link
Contributor

Can you try including this change in the PR to update the workflow to stop the ZED daemon which may be started after the package installation. This won't resolve the Debian issue but I've opened a PR against the buildbot for that we can merge if this does resolve the issue for the Ubuntu GitHub actions builders.

diff --git a/.github/workflows/zfs-tests-functional.yml b/.github/workflows/zfs>
index 631f174b7..79aa8a25b 100644
--- a/.github/workflows/zfs-tests-functional.yml
+++ b/.github/workflows/zfs-tests-functional.yml
@@ -44,6 +44,8 @@ jobs:
         sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
         sudo depmod
         sudo modprobe zfs
+        # Stop the ZED daemon so it won't conflict with the ZTS event tests
+        sudo systemctl stop zfs-zed
     - name: Tests
       run: |
         /usr/share/zfs/zfs-tests.sh -v -s 3G
diff --git a/.github/workflows/zfs-tests-sanity.yml b/.github/workflows/zfs-tes>
index e03399757..09f317142 100644
--- a/.github/workflows/zfs-tests-sanity.yml
+++ b/.github/workflows/zfs-tests-sanity.yml
@@ -40,6 +40,8 @@ jobs:
         sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
         sudo depmod
         sudo modprobe zfs
+        # Stop the ZED daemon so it won't conflict with the ZTS event tests
+        sudo systemctl stop zfs-zed
     - name: Tests
       run: |
         /usr/share/zfs/zfs-tests.sh -v -s 3G -r sanity

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Mar 31, 2021

I've added your patch and it still fails (not sure why, I personally thought it would, but we may never know). I've added a variation to actually just murder all ZEDs instead of flakily detecting the pidfile and log the PIDs it's killing.

@behlendorf
Copy link
Contributor

Let's see how it goes. That's probably fine for the tests.

@nabijaczleweli nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch from 0212c14 to c7e9c97 Compare March 31, 2021 19:40
@nabijaczleweli
Copy link
Contributor Author

FreeBSD 12 failed with filesystem full somewhere, CentOS Stream 8 dropped off, and GitHub Actions Ubuntu 20 ENOSPCed.

But on the buildds that didn't fail for unrelated reasons, it seems to have worked! As a reference, consider this log and jump to "Test (Linux): /usr/share/zfs/zfs-tests/tests/functional/events/zed_rc_filter (run as root) [00:14] [PASS]".

@behlendorf
Copy link
Contributor

@nabijaczleweli OK. Tests are passing now and the change itself makes sense to me. I resubmitted the failed builds except for the ENOSPC Ubuntu 20 which has a separate fix to the GitHub workflow was merged for today.

That leaves us with how best to squash these commits. I'd suggesting something like this, and just squash away the alarm changes.

zed: set names for all threads
zed: use separate reaper thread and collect ZEDLETs asynchronously
zed: remove unused zed_file_close_on_exec()
zed: allow limiting concurrent jobs

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
The FIXME comment was there since the initial implementation in 2014,
there are no users

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
@nabijaczleweli nabijaczleweli force-pushed the ring-the-alarum-bells-v1 branch from c7e9c97 to 932077b Compare April 2, 2021 00:09
@nabijaczleweli
Copy link
Contributor Author

Agree, that does seem best. Squashed onto current master to hopefully catch #11826 as well.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I resubmitted the failed builds (which failed for unrelated reasons) and when they come back clean this should be ready to merge. Thanks for making this improvement, it feels like its been on the TODO list forever.

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Apr 2, 2021

Since the initial implementation in January of 2014, it seems? My pleasure.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 2, 2021
@behlendorf behlendorf closed this in 3ef80ee Apr 2, 2021
behlendorf pushed a commit that referenced this pull request Apr 2, 2021
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
behlendorf pushed a commit that referenced this pull request Apr 2, 2021
The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
behlendorf pushed a commit that referenced this pull request Apr 2, 2021
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
adamdmoss pushed a commit to adamdmoss/zfs that referenced this pull request Apr 10, 2021
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
adamdmoss pushed a commit to adamdmoss/zfs that referenced this pull request Apr 10, 2021
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
adamdmoss pushed a commit to adamdmoss/zfs that referenced this pull request Apr 10, 2021
The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
adamdmoss pushed a commit to adamdmoss/zfs that referenced this pull request Apr 10, 2021
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 14, 2021
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11807
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants