Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSERT core/dynamo.c:1139 on running sol on Ubuntu #1216

Closed
derekbruening opened this issue Nov 28, 2014 · 4 comments
Closed

ASSERT core/dynamo.c:1139 on running sol on Ubuntu #1216

derekbruening opened this issue Nov 28, 2014 · 4 comments

Comments

@derekbruening
Copy link
Contributor

From zhao...@google.com on July 18, 2013 11:42:26

What steps will reproduce the problem? 1. in Ubuntu 13.04: ./bin64/drrun -debug -c ./api/samples/bin/libbbcount.so -- sol
2. close the windows when it shows up What is the expected output? What do you see instead? stop the app successfully, but I saw assert failure

<Stopping application /usr/games/sol (14617)>

^T<CURIOSITY : loop_count < max_loops in file /home/dynamorio/Workspace/DynamoRIO/dynamorio/core/synch.c line 1361
version 4.0.2186, custom build
-client_lib '/home/dynamorio/Workspace/DynamoRIO/builds/build_x64_dbg/./api/samples/bin/libbbcount.so;0;' -code_api -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct
0x0000000049a66680 0x00000000711fef65
0x0000000049a66750 0x000000007108aefa
0x0000000049a66780 0x000000007108b02c
0x0000000049a667b0 0x000000007108b1d6
0x0000000049a667e0 0x0000000071298564
0x0000000049a667f8 0x00000000499d5e40
/home/dynamorio/Workspace/DynamoRIO/builds/build_x64_dbg/./api/samples/bin/libbbcount.so=0x0000000072000000
/lib/x86_64-linux-gnu/libc.so.6=0x00007fdb30364000
/lib64/ld-linux-x86-64.so.2=0x00007fdb3013f000>

<Application /usr/games/sol (14617). Internal Error Internal DynamoRIO Error: /home/dynamorio/Workspace/DynamoRIO/dynamorio/core/dynamo.c:1139 ok
(Error occurred @164694 frags)
version 4.0.2186, custom build
-client_lib '/home/dynamorio/Workspace/DynamoRIO/builds/build_x64_dbg/./api/samples/bin/libbbcount.so;0;' -code_api -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct
0x0000000049a665f0 0x0000000071111db9
0x0000000049a66750 0x000000007108af31
0x0000000049a66780 0x000000007108b02c
0x0000000049a667b0 0x000000007108b1d6
0x0000000049a667e0 0x0000000071298564
0x0000000049a667f8 0x00000000499d5e40
/home/dynamorio/Workspace/DynamoRIO/builds/build_x64_dbg/./api/samples/bin/libbbcount.so=0x0000000072000000
/lib/x86_64-linux-gnu/libc.so.6=0x00007fdb30364000
/lib64/ld-linux-x86-64.so.2=0x00007fdb3013f000>

It runs fine if without client.

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=1216

@derekbruening
Copy link
Contributor Author

From zhao...@google.com on July 18, 2013 14:33:07

It seems caused by one thread is sending the suspend signal while the target thread is performing a poll system call and cannot handle the signal.

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on July 19, 2013 07:32:29

More details: we have an inlined SYS_poll, but the kernel points the pc
post-syscall, which happens to be a meta-instr added by the client:

0x4ec515a7: mov %rax,(%rsp)
0x4ec515ab: mov $0x7,%eax
0x4ec515b0: jmp 0x4ec515b7
0x4ec515b2: jmpq 0x4ebc34c0
0x4ec515b7: syscall
=> 0x4ec515b9: movabs %rax,%gs:0x18
0x4ec515c4: lahf
0x4ec515c5: seto %al
0x4ec515c8: incl 0x235b0afa(%rip) # 0x722020c8
0x4ec515ce: add $0x7f,%al
0x4ec515d0: sahf

So we want to consider that case to be safe for suspending. The argument is
that as the start of a meta sequence it should be ok to re-start upon
relocation and that the full app state should be consistent there -- so
long as it is the start, and it's not, say, a meta rep string instr that's
halfway through. We'd have to use a heuristic to distinguish at-syscall
from post-syscall. The heuristic would check for eax holding -EINTR, but
wouldn't be perfect. ne proposal is to add a nop during mangling after all
syscalls -- which we can limit to all non-auto-restart syscalls. issue #1145 failed to add auto-restart handling for signals that don't enter
record_pending_signal: primarily our own suspend signals, but also nudge
signals. We want to take all the same actions as in
record_pending_signal() (check whether at a syscall, inlined or not, and
call adjust_app_stack_for_signal()). For suspend, we'd tweak the
suspended_sigcxt. Xl8 should naturally call it a safe spot once adjusted.

@derekbruening
Copy link
Contributor Author

derekbruening commented Oct 4, 2017

A related issue here is our own signals interrupting auto-restart syscalls at attach time, where our #1145 auto-restart emulation does not come into play as we're interrupting native code.

A simple proposal here is to set SA_RESTART on our own handler. This will cause the kernel to adjust the sigcontext to restore the original xax and shift the PC back by 2. This should solve the attach-time problem, and should also help with in-cache and gencode cases. There should be no downside, assuming DR can handle "before do_syscall": if that's not considered a safe point and we assume that thread will make progress we could have a problem.

I think the reason #1145 didn't use SA_RESTART was a mis-understanding of when the kernel adjusts the PC: if the kernel adjusts on sigreturn then it messes with cases where DR changes the PC. But the kernel adjusts prior to setting up the frame.

@derekbruening
Copy link
Contributor Author

This issue is really about the post-syscall translation point, which was solved by adding nops in 67e8038. I've split out the restart issues with native code into #2659.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant