[coarse] bound delay of signal+nudge delivery #1087

derekbruening · 2014-11-28T03:23:25Z

From bruen...@google.com on February 22, 2013 13:38:04

this was PR 213040 as referred to in the code

xref issue #717 -------------------------------------
old notes from an old commit:

for signals arriving in coarse fragments, do not try to unlink
(since that's not possible) and instead try to translate
for immediate delivery
remove translation failure asserts since expect to fail sometimes
avoid pclookup if in a coarse unit b/c it's so expensive that it
was preventing forward progress on vmx: got into an infinite loop
with alarm signals arriving faster than they were being processed
also fixed 64-bit errors

SKIP_ALARM_XL8_MAX 3
Total signals delivered : 3048
Signals dropped : 154
Signals in coarse units delayed : 44
SKIP_ALARM_XL8_MAX 2
Total signals delivered : 2812
Signals dropped : 83
Signals in coarse units delayed : 29
SKIP_ALARM_XL8_MAX 1 or 0: test fails

If we hit problems later on bigger workloads or more loaded machines we may well have to do things like reducing the itimer frequency but what I have is sufficient for now.

** TODO switch to read faults?

To support ignorable syscalls in coarse fragments we'll want to switch to
using read faults (bug 211284) to enforce signal delivery delay bounds,
rather than the current unlink scheme, as the cache of a frozen unit will
be read-only.

raising priority as this affects nudges too

we have a few choices:

read fault: but not clear how to do it: not very practical to try and change code
page from +rx to no-access and get boundaries all lined up w/ no intra-page loops
or something, and prob not worth cost of a mem access per fragment (PR 211284)
would changing page from +rx to no-access affect sharing?
try to translate if in fcache (verify: is that always safe: does it ever alloc mem?).
if have an answer, go ahead and interrupt fcache, though will mean creating
a dup tail-fragment.
but what do if can't translate? live w/ unbounded delivery?
meta instrs from client won't happen w/ current model for coarse (maybe
in future if persist tool code?).
I believe it would be possible to add xl8 power to any point in ind br sequence
and not just those that can fault as we have today.

for either, maybe eliminate jmp that skips inlined syscall and
let those fragments be coarse: though since costly to translate going to
leave that alone since more performant that way.

note that earlier idea of disallowing syscalls in coarse fragments is not
sufficient: need to deliver asynch signal that arrives in any loop in cache:
not just about syscalls.

#9 0x004547b7 in internal_error (file=0x590cb0 "/work/dr/.../core/link.c", line=1876,
expr=0x5833ec "false && "fake fragment_t has no exit stubs!"") at /work/dr/.../core/utils.c:182
#10 0x00441654 in unlink_fragment_outgoing (dcontext=0x1ab31900, f=0x1ab8c1f4)
at /work/dr/.../core/link.c:1876
#11 0x00566c72 in unlink_fragment_for_signal (dcontext=0x1ab31900, f=0x1ab8c1f4, pc=0x1ac01218 "\351\303\377")
at /work/dr/.../core/linux/signal.c:2865
#12 0x005703d2 in handle_nudge_signal (ucxt=, siginfo=,
dcontext=) at /work/dr/.../core/linux/signal.c:5280
#13 master_signal_handler (ucxt=, siginfo=, dcontext=)
at /work/dr/.../core/linux/signal.c:3737

we should handle here as well the case of a signal coming in between the
skip-jmp and the syscall, where it's too close to change the jmp

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=1087

derekbruening added Migrated Priority-Medium Type-Bug OpSys-Linux labels Nov 28, 2014

derekbruening removed the Type-Bug label Apr 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[coarse] bound delay of signal+nudge delivery #1087

[coarse] bound delay of signal+nudge delivery #1087

derekbruening commented Nov 28, 2014

[coarse] bound delay of signal+nudge delivery #1087

[coarse] bound delay of signal+nudge delivery #1087

Comments

derekbruening commented Nov 28, 2014

If we hit problems later on bigger workloads or more loaded machines we may well have to do things like reducing the itimer frequency but what I have is sufficient for now.