Skip to content

Commit

Permalink
i#2659 syscall restart: switch to using SA_RESTART (#2660)
Browse files Browse the repository at this point in the history
Changes the strategy for handling interrupted syscalls to use SA_RESTART,
which adds easy handling of interruptions of auto-restart syscalls in
native code, such as during attach or in client C code.

This also simplifies auto-restart of app syscalls, as we now have foolproof
identification of auto-restart situations (before our syscall-number-based
check was inaccurate) and the kernel has restored the clobbered register
value for us.  That eliminates the TLS store we were doing on ARM and the
decode on x86.

For fragment-inlined syscalls, we don't need to do anything anymore: we
deliver the signal immediately and the kernel has already set up the proper
resumption state.

For gencode syscalls, for sane post-syscall handling, we leverage i#1145's
auto-restart emulation and undo what the kernel did.  This lets us go back
to dispatch for a clean sequence of events.

Updates synch and translation checks for post-syscall to also account for
syscall-restart locations.

For sysenter, this change means we now see threads at the int 0x80 restart
point whose fall-through is our vsyscall hook.  To avoid interpreting our
own hook we map that PC to our displaced code, and mapping back in
dr_fragment_app_pc().

Fixes a race between going native and unhooking the vsyscall (to avoid
takeover problems) by redirecting threads at vsyscall or the int 0x80
restart to our own gencode.

Adds a test of a client making a blocking auto-restart syscall to ensure it
is not terminated with EINTR.  This was not simple to arrange in a non-racy
manner and required adding a new feature of immediately delivering a signal
that arrives in DR or client code where the receiving thread is at a safe
point (because if we delay, the test cannot ensure the signal was received).

Fixes #2659
  • Loading branch information
derekbruening authored and fhahn committed Dec 4, 2017
1 parent c1fd846 commit 2e724d7
Show file tree
Hide file tree
Showing 15 changed files with 288 additions and 140 deletions.
9 changes: 0 additions & 9 deletions core/arch/aarchxx/mangle.c
Original file line number Diff line number Diff line change
Expand Up @@ -1354,15 +1354,6 @@ mangle_syscall_arch(dcontext_t *dcontext, instrlist_t *ilist, uint flags,
* mechanism) before the system call, and restore it afterwards.
*/
ASSERT(DR_REG_STOLEN_MIN > DR_REG_SYSNUM);

/* We have to save r0 in case the syscall is interrupted. To restart
* it, we need to replace the kernel's -EINTR in r0 with the original
* app arg.
* XXX optimization: we could try to get the syscall number and avoid
* this for non-auto-restart syscalls.
*/
PRE(ilist, instr,
instr_create_save_to_tls(dcontext, DR_REG_R0, TLS_REG0_SLOT));
}

# ifdef UNIX
Expand Down
8 changes: 8 additions & 0 deletions core/arch/interp.c
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,14 @@ init_build_bb(build_bb_t *bb, app_pc start_pc, bool app_interp, bool for_cache,
overlap_info_t *overlap_info)
{
memset(bb, 0, sizeof(*bb));
#if defined(LINUX) && defined(X86_32)
/* With SA_RESTART (i#2659) we end up interpreting the int 0x80 in vsyscall,
* whose fall-through hits our hook. We avoid interpreting our own hook
* by shifting it to the displaced pc.
*/
if (start_pc == vsyscall_sysenter_return_pc)
start_pc = vsyscall_sysenter_displaced_pc;
#endif
bb->check_vm_area = true;
bb->start_pc = start_pc;
bb->app_interp = app_interp;
Expand Down
1 change: 0 additions & 1 deletion core/arch/mangle_shared.c
Original file line number Diff line number Diff line change
Expand Up @@ -692,7 +692,6 @@ mangle_syscall(dcontext_t *dcontext, instrlist_t *ilist, uint flags,
sysnum_is_not_restartable(ilist_find_sysnum(ilist, instr))) {
/* i#1216: we insert a nop instr right after inlined non-auto-restart
* syscall to make it a safe point for suspending.
* XXX-i#1216-c#2: we still need handle auto-restart syscall
*/
instr_t *nop = XINST_CREATE_nop(dcontext);
/* We make a fake app nop instr for easy handling in recreate_app_state.
Expand Down
9 changes: 7 additions & 2 deletions core/dispatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -798,7 +798,7 @@ dispatch_enter_dynamorio(dcontext_t *dcontext)
* seeing post- and not pre-.
*/
LOG(THREAD, LOG_INTERP, 2, "hit post-sysenter hook while native\n");
ASSERT(dcontext->currently_stopped);
ASSERT(dcontext->currently_stopped || IS_CLIENT_THREAD(dcontext));
dcontext->next_tag = BACK_TO_NATIVE_AFTER_SYSCALL;
dcontext->native_exec_postsyscall =
IF_UNIX_ELSE(vsyscall_sysenter_displaced_pc, vsyscall_syscall_end_pc);
Expand Down Expand Up @@ -1780,8 +1780,11 @@ adjust_syscall_continuation(dcontext_t *dcontext)
/* dr_syscall_invoke_another() hits this */
dcontext->asynch_target == vsyscall_sysenter_displaced_pc);
/* i#1939: we do need to adjust for 4.4.8+ kernels */
if (!dcontext->sys_was_int && vsyscall_sysenter_displaced_pc != NULL)
if (!dcontext->sys_was_int && vsyscall_sysenter_displaced_pc != NULL) {
dcontext->asynch_target = vsyscall_sysenter_displaced_pc;
LOG(THREAD, LOG_SYSCALLS, 3,
"%s: asynch_target => "PFX"\n", __FUNCTION__, dcontext->asynch_target);
}
# endif
} else if (vsyscall_syscall_end_pc != NULL &&
/* PR 341469: 32-bit apps (LOL64) on AMD hardware have
Expand All @@ -1795,6 +1798,8 @@ adjust_syscall_continuation(dcontext_t *dcontext)
if (dcontext->asynch_target == vsyscall_syscall_end_pc) {
ASSERT(vsyscall_sysenter_return_pc != NULL);
dcontext->asynch_target = vsyscall_sysenter_return_pc;
LOG(THREAD, LOG_SYSCALLS, 3,
"%s: asynch_target => "PFX"\n", __FUNCTION__, dcontext->asynch_target);
}
}
}
Expand Down
4 changes: 4 additions & 0 deletions core/lib/instrument.c
Original file line number Diff line number Diff line change
Expand Up @@ -6978,6 +6978,10 @@ dr_fragment_app_pc(void *tag)
SYSLOG_INTERNAL_WARNING_ONCE("dr_fragment_app_pc is a DR/client pc");
}
});
#elif defined(LINUX) && defined(X86_32)
/* Point back at our hook, undoing the bb shift for SA_RESTART (i#2659). */
if ((app_pc)tag == vsyscall_sysenter_displaced_pc)
tag = vsyscall_sysenter_return_pc;
#endif
return tag;
}
Expand Down
64 changes: 58 additions & 6 deletions core/synch.c
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,13 @@ thread_synch_state_no_xfer(dcontext_t *dcontext)
tsd->synch_perm == THREAD_SYNCH_VALID_MCONTEXT_NO_XFER);
}

bool
thread_synch_check_state(dcontext_t *dcontext, thread_synch_permission_t desired_perm)
{
thread_synch_data_t *tsd = (thread_synch_data_t *) dcontext->synch_field;
return THREAD_SYNCH_SAFE(tsd->synch_perm, desired_perm);
}

/* Only valid while holding all_threads_synch_lock and thread_initexit_lock. Set to
* whether synch_with_all_threads was successful in synching this thread.
* Cannot be called when THREAD_SYNCH_*_AND_CLEANED was requested as the
Expand All @@ -188,6 +195,28 @@ thread_synch_successful(thread_record_t *tr)
return tsd->synch_with_success;
}

#ifdef UNIX
/* i#2659: the kernel is now doing auto-restart so we have to check for the
* pc being at the syscall.
*/
static bool
is_after_or_restarted_do_syscall(dcontext_t *dcontext, app_pc pc, bool check_vsyscall)
{
if (is_after_do_syscall_addr(dcontext, pc))
return true;
if (check_vsyscall && pc == vsyscall_sysenter_return_pc)
return true;
if (!get_at_syscall(dcontext)) /* rule out having just reached the syscall */
return false;
int syslen = syscall_instr_length(dr_get_isa_mode(dcontext));
if (is_after_do_syscall_addr(dcontext, pc + syslen))
return true;
if (check_vsyscall && pc + syslen == vsyscall_sysenter_return_pc)
return true;
return false;
}
#endif

bool
is_at_do_syscall(dcontext_t *dcontext, app_pc pc, byte *esp)
{
Expand All @@ -207,7 +236,7 @@ is_at_do_syscall(dcontext_t *dcontext, app_pc pc, byte *esp)
return pc == after_do_syscall_code(dcontext);
}
#else
return is_after_do_syscall_addr(dcontext, pc);
return is_after_or_restarted_do_syscall(dcontext, pc, false/*!vsys*/);
#endif
} else if (get_syscall_method() == SYSCALL_METHOD_SYSENTER) {
#ifdef WINDOWS
Expand All @@ -231,8 +260,7 @@ is_at_do_syscall(dcontext_t *dcontext, app_pc pc, byte *esp)
* distinguish those: but for now if a sysenter instruction is used it
* has to be do_syscall since DR's own syscalls are ints.
*/
return (pc == vsyscall_sysenter_return_pc ||
is_after_do_syscall_addr(dcontext, pc));
return is_after_or_restarted_do_syscall(dcontext, pc, true/*vsys*/);
#endif
}
/* we can reach here w/ a fault prior to 1st syscall on Linux */
Expand Down Expand Up @@ -1599,8 +1627,29 @@ translate_from_synchall_to_dispatch(thread_record_t *tr, thread_synch_state_t sy
"\tat syscall so not translating\n");
/* sanity check */
ASSERT(is_after_syscall_address(dcontext, pre_translation) ||
pre_translation == IF_WINDOWS_ELSE(vsyscall_after_syscall,
vsyscall_sysenter_return_pc));
IF_WINDOWS_ELSE(pre_translation == vsyscall_after_syscall,
is_after_or_restarted_do_syscall
(dcontext, pre_translation, true/*vsys*/)));
#if defined(UNIX) && defined(X86_32)
if (pre_translation == vsyscall_sysenter_return_pc ||
pre_translation + SYSENTER_LENGTH == vsyscall_sysenter_return_pc) {
/* Because we remove the vsyscall hook on a send_all_other_threads_native()
* yet have no barrier to know the threads have run their own go-native
* code, we want to send them away from the hook, to our gencode.
*/
if (pre_translation == vsyscall_sysenter_return_pc)
mc->pc = after_do_shared_syscall_addr(dcontext);
else if (pre_translation + SYSENTER_LENGTH == vsyscall_sysenter_return_pc)
mc->pc = get_do_int_syscall_entry(dcontext);
/* exit stub and subsequent fcache_return will save rest of state */
res = set_synched_thread_context(dcontext->thread_record, mc, NULL, 0,
synch_state _IF_X64((void *)mc)
_IF_WINDOWS(NULL));
ASSERT(res);
/* cxt is freed by set_synched_thread_context() or target thread */
free_cxt = false;
}
#endif
IF_ARM({
if (INTERNAL_OPTION(steal_reg_at_reset) != 0) {
/* We don't want to translate, just update the stolen reg values */
Expand Down Expand Up @@ -1831,10 +1880,13 @@ send_all_other_threads_native(void)
* going to dispatch and then going native when its syscall exits.
*
* FIXME i#95: That means the time to go native is, unfortunately,
* unbounded. this means that dr_app_cleanup() needs to synch the
* unbounded. This means that dr_app_cleanup() needs to synch the
* threads and force-xl8 these. We should share code with detach.
* Right now we rely on the app joining all its threads *before*
* calling dr_app_cleanup(), or using dr_app_stop_and_cleanup().
* This also means we have a race with unhook_vsyscall in
* os_process_not_under_dynamorio(), which we solve by redirecting
* threads at syscalls to our gencode.
*/
translate_from_synchall_to_dispatch(threads[i], desired_state);
}
Expand Down
5 changes: 4 additions & 1 deletion core/synch.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* **********************************************************
* Copyright (c) 2012-2016 Google, Inc. All rights reserved.
* Copyright (c) 2012-2017 Google, Inc. All rights reserved.
* Copyright (c) 2008-2010 VMware, Inc. All rights reserved.
* **********************************************************/

Expand Down Expand Up @@ -146,6 +146,9 @@ synch_thread_exit(dcontext_t *dcontext);
bool
thread_synch_state_no_xfer(dcontext_t *dcontext);

bool
thread_synch_check_state(dcontext_t *dcontext, thread_synch_permission_t desired_perm);

/* Only valid while holding all_threads_synch_lock and thread_initexit_lock. Set to
* whether synch_with_thread was successful.
* Cannot be called when THREAD_SYNCH_*_AND_CLEANED was requested as the
Expand Down
4 changes: 3 additions & 1 deletion core/translate.c
Original file line number Diff line number Diff line change
Expand Up @@ -936,7 +936,9 @@ recreate_app_state_internal(dcontext_t *tdcontext, priv_mcontext_t *mcontext,
is_after_main_do_syscall_addr(tdcontext, mcontext->pc) ||
/* Check for pointing right at sysenter, for i#1145 */
mcontext->pc + SYSENTER_LENGTH == vsyscall_syscall_end_pc ||
is_after_main_do_syscall_addr(tdcontext, mcontext->pc + SYSENTER_LENGTH))) {
is_after_main_do_syscall_addr(tdcontext, mcontext->pc + SYSENTER_LENGTH) ||
/* Check for pointing at the sysenter-restart int 0x80 for i#2659 */
mcontext->pc + SYSENTER_LENGTH == vsyscall_sysenter_return_pc)) {
/* If at do_syscall yet not yet in the kernel (or the do_syscall still uses
* int: i#2005), we need to translate to vsyscall, for detach (i#95).
*/
Expand Down
9 changes: 9 additions & 0 deletions core/unix/os.c
Original file line number Diff line number Diff line change
Expand Up @@ -2580,6 +2580,15 @@ os_process_under_dynamorio_initiate(dcontext_t *dcontext)
/* We only support regular process-wide signal handlers for delayed takeover. */
/* i#2161: we ignore alarm signals during the attach process to avoid races. */
signal_reinstate_handlers(dcontext, true/*ignore alarm*/);
/* XXX: there's a tradeoff here: we have a race when we remove the hook
* because dr_app_stop() has no barrier and a thread sent native might
* resume from vsyscall after we remove the hook. However, if we leave the
* hook, then the next takeover signal might hit a native thread that's
* inside DR just to go back native after having hit the hook. For now we
* remove the hook and rely on translate_from_synchall_to_dispatch() moving
* threads from vsyscall to our gencode and not relying on the hook being
* present to finish up their go-native code.
*/
hook_vsyscall(dcontext, false);
}

Expand Down
Loading

0 comments on commit 2e724d7

Please sign in to comment.