Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall auto-restart handling does not handle native syscalls: DR, clients, attach #2659

Closed
derekbruening opened this issue Oct 4, 2017 · 1 comment

Comments

@derekbruening
Copy link
Contributor

derekbruening commented Oct 4, 2017

i#1145 failed to add auto-restart handling for signals that don't enter
record_pending_signal: primarily our own suspend signals, but also nudge
signals. This includes all native interruption points: DR and client C code, app code for attach.

Xref #1145, #1216

Support auto-restart in client C code

So far we've had to explicitly put loops checking for EINTR into client
code where we wouldn't have to natively. Proper handling here would
eliminate that.

Incorrect auto-restart for signals used during attach results in broken apps

If an attach takeover signal interrupts an auto-restart syscall, we'll
break the app. We have seen this on some large apps.

Proposal: pass SA_RESTART to our own handlers!

A simple proposal here is to set SA_RESTART on our own handler. This will
cause the kernel to adjust the sigcontext to restore the original xax and
shift the PC back by 2. This will solve the native cases (attach, client,
DR code) without any further work.

I think the reason #1145 didn't use SA_RESTART was a mis-understanding of
when the kernel adjusts the PC: if the kernel adjusts on sigreturn then it
messes with cases where DR changes the PC. But the kernel adjusts prior to
setting up the frame.

Inlined syscalls, deliver signal immediately and leave context alone

For interrupting an inlined syscall in the cache: deliver the signal
immediately as we do now, just as though the signal arrived just before the
syscall executed. There is no pre- or post-syscall handling to worry
about. No need to change the context: use the one the kernel reset. For a
non-auto-restart syscall, leave that alone too.

For from-dispatch syscalls, have to undo the kernel's reset and then emulate

For interrupting do_syscall: first check get_at_syscall() to confirm we
executed the pre-syscall part. If not, handle normally. If so, we either
have to go back to dispatch to deliver the signal, in which case we may as
well undo the pc-2 and xax restore and use our auto-restart emulation code,
or we deliver the signal immediately and have problems w/ cross-syscall
state (just like "CLS" on Windows). Having to keep emulating seems
backward but it does seem like the simplest solution? Or should we try to
call handle_post_system_call() and then deliver the signal immediately?

For delivering immediately, xref i#1206 "must deliver signal received in
do_syscall immediately". We canceled that b/c w/ the actual interruption
w/o SA_RESTART we didn't need it.

w/o doing anything we get a HANG on the eintr-noinline test

It records the signal, and returns -- but that return restarts the
syscall, which blocks, and we then never deliver the signal.

-ignore_syscalls doesn't hang b/c:

record_pending_signal(23) from cache pc 0x000000005535b044
        delaying until exit F2033
signal interrupted pre/post syscall itself so delivering now

We can now live w/ being inaccurate at identifying auto-restart syscalls

Our current check just looks at the syscall number which is not enough.
But with SA_RESTART, the kernel tells us which ones are auto-restart, and
we only need our own check for i#1216's nops -- and for the nops we can be
conservative and insert nops where the syscall might be auto-restart.

Transparency of post-syscall handler

Re: is it un-transparent to have EINTR returned for an auto-restart
syscall: since no app code is run and it's just DR and client-visible state
it seeems ok. I guess a client counting syscall returns would have it
wrong.

Avoiding this seems hard: we'd have to stack dcontexts like on Windows.

We can just pass in the xax/r0 value to restore

We can avoid storing it into TLS on ARM, and decoding on x86.

is_after_syscall checks in synch, etc. have to handle pc==syscall

We have to distinguish the case of not yet having executed the syscall yet,
from having executed it but it got interrupted with the pc set backward.
If we only have to do this for gencode handled from dispatch,
get_at_syscall() will work.

w/ SA_RESTART, can we remove all the emulation code? and the nop insertion? => no

No: nop insertion is for non-auto-restart syscalls, and the emulation is
needed to get back to dispatch. We can make it more accurate.

What about sysenter restart: xref us re-copying xsp into xbp

Add a test w/ a client doing an eintr.c-type syscall

@derekbruening
Copy link
Contributor Author

A big challenge here is sysenter on recent kernels where the syscall restart point is an int 0x80 rather than a jmp back up to the mov esp,ebp;sysenter: the problem is that the int 0x80's fallthrough is our hook jmp, meaning we can't leave control at the syscall restart point when going native and we have to be careful not to interpret our own hook.

derekbruening added a commit that referenced this issue Oct 6, 2017
Changes the strategy for handling interrupted syscalls to use SA_RESTART,
which adds easy handling of interruptions of auto-restart syscalls in
native code, such as during attach or in client C code.

This also simplifies auto-restart of app syscalls, as we now have foolproof
identification of auto-restart situations (before our syscall-number-based
check was inaccurate) and the kernel has restored the clobbered register
value for us.  That eliminates the TLS store we were doing on ARM and the
decode on x86.

For fragment-inlined syscalls, we don't need to do anything anymore: we
deliver the signal immediately and the kernel has already set up the proper
resumption state.

For gencode syscalls, for sane post-syscall handling, we leverage i#1145's
auto-restart emulation and undo what the kernel did.  This lets us go back
to dispatch for a clean sequence of events.

Updates synch and translation checks for post-syscall to also account for
syscall-restart locations.

For sysenter, this change means we now see threads at the int 0x80 restart
point whose fall-through is our vsyscall hook.  To avoid interpreting our
own hook we map that PC to our displaced code, and mapping back in
dr_fragment_app_pc().

Fixes a race between going native and unhooking the vsyscall (to avoid
takeover problems) by redirecting threads at vsyscall or the int 0x80
restart to our own gencode.

Adds a test of a client making a blocking auto-restart syscall to ensure it
is not terminated with EINTR.  This was not simple to arrange in a non-racy
manner and required adding a new feature of immediately delivering a signal
that arrives in DR or client code where the receiving thread is at a safe
point (because if we delay, the test cannot ensure the signal was received).

Fixes #2659
derekbruening added a commit that referenced this issue Oct 6, 2017
Changes the strategy for handling interrupted syscalls to use SA_RESTART,
which adds easy handling of interruptions of auto-restart syscalls in
native code, such as during attach or in client C code.

This also simplifies auto-restart of app syscalls, as we now have foolproof
identification of auto-restart situations (before our syscall-number-based
check was inaccurate) and the kernel has restored the clobbered register
value for us.  That eliminates the TLS store we were doing on ARM and the
decode on x86.

For fragment-inlined syscalls, we don't need to do anything anymore: we
deliver the signal immediately and the kernel has already set up the proper
resumption state.

For gencode syscalls, for sane post-syscall handling, we leverage i#1145's
auto-restart emulation and undo what the kernel did.  This lets us go back
to dispatch for a clean sequence of events.

Updates synch and translation checks for post-syscall to also account for
syscall-restart locations.

For sysenter, this change means we now see threads at the int 0x80 restart
point whose fall-through is our vsyscall hook.  To avoid interpreting our
own hook we map that PC to our displaced code, and mapping back in
dr_fragment_app_pc().

Fixes a race between going native and unhooking the vsyscall (to avoid
takeover problems) by redirecting threads at vsyscall or the int 0x80
restart to our own gencode.

Adds a test of a client making a blocking auto-restart syscall to ensure it
is not terminated with EINTR.  This was not simple to arrange in a non-racy
manner and required adding a new feature of immediately delivering a signal
that arrives in DR or client code where the receiving thread is at a safe
point (because if we delay, the test cannot ensure the signal was received).

Fixes #2659
fhahn pushed a commit that referenced this issue Dec 4, 2017
Changes the strategy for handling interrupted syscalls to use SA_RESTART,
which adds easy handling of interruptions of auto-restart syscalls in
native code, such as during attach or in client C code.

This also simplifies auto-restart of app syscalls, as we now have foolproof
identification of auto-restart situations (before our syscall-number-based
check was inaccurate) and the kernel has restored the clobbered register
value for us.  That eliminates the TLS store we were doing on ARM and the
decode on x86.

For fragment-inlined syscalls, we don't need to do anything anymore: we
deliver the signal immediately and the kernel has already set up the proper
resumption state.

For gencode syscalls, for sane post-syscall handling, we leverage i#1145's
auto-restart emulation and undo what the kernel did.  This lets us go back
to dispatch for a clean sequence of events.

Updates synch and translation checks for post-syscall to also account for
syscall-restart locations.

For sysenter, this change means we now see threads at the int 0x80 restart
point whose fall-through is our vsyscall hook.  To avoid interpreting our
own hook we map that PC to our displaced code, and mapping back in
dr_fragment_app_pc().

Fixes a race between going native and unhooking the vsyscall (to avoid
takeover problems) by redirecting threads at vsyscall or the int 0x80
restart to our own gencode.

Adds a test of a client making a blocking auto-restart syscall to ensure it
is not terminated with EINTR.  This was not simple to arrange in a non-racy
manner and required adding a new feature of immediately delivering a signal
that arrives in DR or client code where the receiving thread is at a safe
point (because if we delay, the test cannot ensure the signal was received).

Fixes #2659
derekbruening added a commit that referenced this issue May 14, 2018
With SA_RESTART (#2659), we cannot distinguish
not-yet-executed-syscall from syscall-was-interrupted-in-the-kernel.
This doesn't matter for most syscalls, but it does matter for
sigreturn, whose asynch_target points somewhere other than right after
the syscall, so we exclude it (it can't be interrupted so we know we
haven't executed it yet).

Fixes #2995
derekbruening added a commit that referenced this issue May 14, 2018
With SA_RESTART (#2659), we cannot distinguish
not-yet-executed-syscall from syscall-was-interrupted-in-the-kernel.
This doesn't matter for most syscalls, but it does matter for
sigreturn, whose asynch_target points somewhere other than right after
the syscall, so we exclude it (it can't be interrupted so we know we
haven't executed it yet).

Fixes #2995
derekbruening added a commit that referenced this issue May 24, 2018
Avoids translating a new signal while suspended, as the resulting lock
acquisitions can lead to deadlock.  An incoming signal there should
always be delayable in any case: the only reason we were trying to
deliver immediately was due to a too-broad check put in for #2659.

It is not easy to create a test that reliably exercises these races:
our existing detach_spawn and other tests are the closest and this
should reduce their flakiness.

Fixes #3026
derekbruening added a commit that referenced this issue May 24, 2018
Avoids translating a new signal while suspended, as the resulting lock
acquisitions can lead to deadlock.  An incoming signal there should
always be delayable in any case: the only reason we were trying to
deliver immediately was due to a too-broad check put in for #2659.

It is not easy to create a test that reliably exercises these races:
our existing detach_spawn and other tests are the closest and this
should reduce their flakiness.

Fixes #3026
fhahn pushed a commit that referenced this issue Jun 18, 2018
Avoids translating a new signal while suspended, as the resulting lock
acquisitions can lead to deadlock.  An incoming signal there should
always be delayable in any case: the only reason we were trying to
deliver immediately was due to a too-broad check put in for #2659.

It is not easy to create a test that reliably exercises these races:
our existing detach_spawn and other tests are the closest and this
should reduce their flakiness.

Fixes #3026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant