Fix a deadlock in syscallbuf unmapping after vfork #3826

KJTsanaktsidis · 2024-09-15T00:59:38Z

This should be 🤞 a fix for #3807

It does two things to address that issue:

Firstly, moves the code for doing syscallbuf unmap after vfork/exec out of Task::post_exec, and instead defers doing that until we next run a task in that address space. This means we should never leak address space just because there were no available threads to perform the unmapping. This doesn't actually do anything to technically fix the issue, but it means we can write some reliable tests around the syscallbuf unmapping stuff, so it seemed worthwhile.

Secondly, we now look at the desched signal state in AutoRemoteSyscalls. If the desched signal is armed, we temporarily disarm it (and mask it too, in case the signal was pending but unprocessed). This is what should fix our issue, I think

It's hard to prove that this really has fixed the issue, but I left my reproduction script running for 12 hours last night and it didn't hang, and that had always been long enough to trigger the hang before. So I guess either this fixes the issue, or I got (un)lucky.

I'm sure my C++ isn't going to win any beauty contests here but hopefully this demonstrates the outlines of a fix.

If two processes share an address space (via vfork(2) or clone(2) CLONE_VM), and one processes calls execve(2), the process's syscallbufs will remain mapped in the shared address space. At the moment, we attempt to unmap this after execve(2) by finding a stopped process in the shared address space, and using it to do the unmapping. If there is no such process, though, the buffers are leaked. This patch changes that situation to ensure that such syscallbufs are unmapped in all circumstances. This is done by recording that such unmapping must happen after execve(2), but deferring the unmapping until a suitable process is scheduled and in a state to perform the unmapping.

If we're running on a thread with a desched event enabled, we can enter an infinite loop where stepping through the syscall infinitely retriggers the desched event, and thus the syscall can never complete.

rocallahan

This is really excellent work in a very difficult area. Bravo!

rocallahan · 2024-09-17T21:17:16Z

src/Task.cc

+ << "previously exec'd processes";
+ AutoRemoteSyscalls remote(this);
+ std::vector<MemoryRange> regions_pending_unmap;
+ std::swap(regions_pending_unmap, as->regions_pending_unmap);


This could be a move, but it's fine.

rocallahan · 2024-09-17T21:20:39Z

Honestly, really extraordinary work.

KJTsanaktsidis · 2024-09-17T23:38:17Z

Thank you! I really appreciated your pointing in the right direction. This was definitely a fun exercise and I learned quite a bit about how rr works. And brushed off a bit of my C++ too.

KJTsanaktsidis added 2 commits September 14, 2024 19:42

Disarm desched event in AutoRemoteSyscalls

f152f6e

If we're running on a thread with a desched event enabled, we can enter an infinite loop where stepping through the syscall infinitely retriggers the desched event, and thus the syscall can never complete.

KJTsanaktsidis mentioned this pull request Sep 15, 2024

Deadlock in infallible_munmap_syscall_if_alive #3807

Closed

rocallahan approved these changes Sep 17, 2024

View reviewed changes

rocallahan merged commit ceeff12 into rr-debugger:master Sep 17, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a deadlock in syscallbuf unmapping after vfork #3826

Fix a deadlock in syscallbuf unmapping after vfork #3826

KJTsanaktsidis commented Sep 15, 2024

rocallahan left a comment

rocallahan Sep 17, 2024

rocallahan commented Sep 17, 2024

KJTsanaktsidis commented Sep 17, 2024

Fix a deadlock in syscallbuf unmapping after vfork #3826

Fix a deadlock in syscallbuf unmapping after vfork #3826

Conversation

KJTsanaktsidis commented Sep 15, 2024

rocallahan left a comment

Choose a reason for hiding this comment

rocallahan Sep 17, 2024

Choose a reason for hiding this comment

rocallahan commented Sep 17, 2024

KJTsanaktsidis commented Sep 17, 2024