Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zed: a lock-based pickle indeed #11965

Merged
merged 1 commit into from
May 7, 2021

Conversation

nabijaczleweli
Copy link
Contributor

@nabijaczleweli nabijaczleweli commented Apr 28, 2021

Motivation and Context

#11963

Description

Lock around fork() and in the SIGCHLD handler. The latter should always lock as soon as a child dies and, if Linux does in fact re-use zombie PIDs, always come before the next fork().

But if this doesn't work then I genuinely don't know if this is possible under UNIX or if it's just always gonna race like this. See commit message.

How Has This Been Tested?

It wasn't. I mean, it runs, but. See commit message.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 28, 2021
@nabijaczleweli nabijaczleweli force-pushed the rickle-pick branch 2 times, most recently from 19a69d4 to 96f3b75 Compare May 1, 2021 14:27
@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented May 1, 2021

Rebased, replaced with a solution that I tried, understood, and tested. @don-brady mind giving'er a spin?

If you can afford to do it, setting max_pids to something hilarious like 400 greatly exacerbates this, I hit it every time within minutes, if that.

This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:
  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11963
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nabijaczleweli thanks for running this down.

@don-brady
Copy link
Contributor

We tried working around this by launching zed with -j 1 and that was still hitting the assert. I have yet to test with this change but wanted to understand why having jobs set to 1 would allow async execution?

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 7, 2021
@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented May 7, 2021

-j guarantees that at any given point j_N - ∑(fork() ≠ -1) + ∑(wait4() ∉ {-1, 0}) >= 0, as documented.

The reaping methodology is unaffected, and the sequence in that case remains limit>0 => fork()=A => A:exit_group() => reaper:SIGCHLD => reaper:lock => limit-=1 => lock:sleep => reaper:wait4()=A => reaper:find(A)=NULL => reaper:unlock => limit+=1 => lock:wake => add(A) => unlock. This can then repeat with another child with PID of A, panicking on the add(A) step, and, since the limit ends at the same value that it started, you can see how it plays no role in this.

The updated sequence would look like limit>0 => lock => fork()=A => A:exit_group() => reaper:SIGCHLD => reaper:lоck:sleep() => add(A) => unlock => limit-=1 => reaper:lоck:wake => reaper:wait4()=A => reaper:find(A)=A* => reaper:unlock => limit+=1.

@behlendorf behlendorf merged commit 3bd6b0e into openzfs:master May 7, 2021
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request May 10, 2021
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:

  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11963
Closes openzfs#11965
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:

  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#11963
Closes openzfs#11965
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants