-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#4271: Fix signal-syscall race handling on AArch64. #4453
Conversation
Fixes logic to detect the two jmps preceding a syscall. These jmps are used to bypass the syscall, which bounds the time before exiting code cache and hence delivering the signal. Also, enables the linux.signal_racesys test on AArch64. This required adding the missing implementation to check for pending signals before entering fcache. Issue: #2043,#4271 Fixes: #4271
char is unsigned by default on ARM based machines; https://www.arm.linux.org.uk/docs/faqs/signedchar.php. This leads to wrong usage of the signals_pending variable, which is negative when a signal is being handled.
run arm tests |
I was able to reproduce the original assert failure from #4271 on
This comes up maybe ~half of the time on Based on the comment at the assert, this is probably because of too small app stack. Line 6354 in 25ca6af
@derekbruening any feedback on the current set of changes in this PR, or how they could related to this assert failure? |
Not sure I understand that comment...if the stack is too small to fit the frame, the frame will overflow the stack and either hit unreadable memory like a guard page and we would then send SIGSEGV which would likely kill the process, or if there really is no guard or adjacent unreadable you'd just clobber sthg. |
Sorry, missed this in the thread of other notifications. Looking now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we share the append_fcache_enter_prologue code?
Trying to understand the assert: so it's making sure the passed-to-app sc->pc != fcache_return? Confused. |
…es." The earlier failure I saw was probably an unrelated one: i#4456. This reverts commit 4bfe2b2.
The existing logic does not work when skip_pc is pointing to the exit cti_pc. This may cause issues during re-linking.
It seems so. Based on the comment there, it signifies that the inner frame was overwritten during handling of a nested signal? I added some logs to
The second-to-last
Trying to find out what that signifies. Also, there's an NYI comment where dynamorio/core/unix/os_public.h Line 137 in d7321a6
|
So a signal came in while executing dynamorio_sigreturn? What are the signal #'s of the prior and new one; new one is asynch (right)?) so why wasn't it delayed; wouldn't the interrupted context still be dynamorio_sigreturn and not fcache_return; etc.
Looks like an obsolete comment. |
Following up on this failure in It seems that the new A hang in To verify this, I added a Let me figure out a good fix to push. I'll also add an Line 506 in 32236dc
|
I added support for |
As requested offline, I ran 3 tests 20x each. The 2 signalnnnn ones passed all 20 times, but ptsig failed on the 10th run:
Let me run HEAD and see whether ptsig is flaky there too. |
Here is the log for that failure:
|
OK so I ran ptsig 100x and had 2/100 fail without this PR and 1/100 with so I think we're good. |
Thanks a lot for checking Derek! I just recalled there were two others that were important for this PR. These were part of the earlier failures I faced in this PR so it skipped my mind before :(. Please run whenever you get the chance. Thanks again. |
For
|
Great! |
Fixes logic to detect the two jmps preceding a syscall on AArch64. These jmps are used to bypass the syscall, which bounds the time before exiting code cache and hence delivering the signal.
Enables the linux.signal_racesys test on AArch64, which reproduces the assert failure in current implementation. This required adding the missing implementation to check for pending signals before entering fcache in append_fcache_enter_prologue.
Makes the signals_pending char as signed explicitly, using the existing sbyte alias. This is required because chars are unsigned by default on ARM. Also, adds support for OP_ldrsb instrs on AArch64 to load this data.
Fixes logic for skip_pc detection on ARM32. The existing logic does not work when skip_pc is pointing to the exit cti_pc. This prevents fragment re-linking and may cause performance issues.
Fixes: #4271, #2043