Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attach injection on Linux #38

Open
derekbruening opened this issue Nov 27, 2014 · 20 comments
Open

attach injection on Linux #38

derekbruening opened this issue Nov 27, 2014 · 20 comments

Comments

@derekbruening
Copy link
Contributor

From derek.br...@gmail.com on February 24, 2009 10:01:18

this was PR 204490

focusing on Linux as Windows has issues that are best solved with a kernel
driver, unless we think we can rely on backward decoding heuristics or
don't mind losing control for a while

we do have some issues on Linux:

  • ability to suspend threads before we have control of them: we will
    probably rely on ptrace (xref issue targeted injection on Linux via ptrace  #37 )
  • determining the state of sharing of CLONE_* among threads: should
    probably use a modify-and-observe approach

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=38

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on April 09, 2012 11:59:07

xref issue #725 : attach feature on Windows
xref issue #722 : internally-triggered attach

@derekbruening
Copy link
Contributor Author

derekbruening commented Nov 27, 2014

From rnk@google.com on July 16, 2012 06:33:26

Another thing we have to think about for attach is, what do we do about stdio fds? If we're not present from the beginning, the app can close fds 0, 1, and 2, and use them for something else. I have a test app that gets the stderr stream pointing to a non-standard file descriptor:

#include <stdio.h>
int main(void) {
    fprintf(stdout, "stderr->_fileno: %d\n", stderr->_fileno);
    fprintf(stderr, "this is to stderr\n");
    fclose(stderr);
    FILE _t2 = fopen("t2", "w");  /* Steals fd 2 */
    freopen("t", "w", stderr);
    fclose(t2);
    fprintf(stdout, "stderr->_fileno: %d\n", stderr->_fileno);  /* prints 3 on my system */
    fprintf(stderr, "this is to t\n");
    return 0;
}

Currently, we'll come in and import stderr from libc and use its _fileno. When the app closes the fd, we'll dup it to keep it alive. However, for libc isolation we're going to cut this import and just use the standard fileno constants from unistd.h. So, for attach, we may want to do some poking for libc and try to find the current stream filenos.

This is kind of a corner case, though. Most apps are likely to either close the stdio fds completely, or call freopen() on them without closing them and opening a new file first, which typically reuses the same fd number.

If the app has totally closed fds, like if the user is trying to attach to a daemon, is there some way we can connect STDOUT and STDERR to the tty of the injector?

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on March 01, 2013 12:33:15

xref issue #764 : support attaching to processes with non-pthreads threads

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on March 01, 2013 12:33:24

Owner: rnk@google.com

@derekbruening
Copy link
Contributor Author

From rnk@google.com on March 02, 2013 13:53:01

Testing on modern distros is made more difficult by the ptrace_scope stuff that's been done recently. I want to add a test for attach, but I don't want to require devs to set /proc/sys/kernel/yama/ptrace_scope to 0 in order to get the test to pass.

The new restrictions say that you can only ptrace a child process. I think I can adequately test our injection if I do the following:

  • master process: fork+execve injector
  • injector process: fork+execve app
  • injector process: wait for app to do something interesting
  • injector process: execve drrun -attach
  • master process: wait on injector, wait on app (can't waitpid on grandchild, need a pipe or file)

@derekbruening
Copy link
Contributor Author

From rnk@google.com on March 12, 2013 06:45:39

Oops, comment 5 is about ptrace injection, which is issue #37 .

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on October 25, 2013 11:46:28

xref issue #1305

@derekbruening
Copy link
Contributor Author

derekbruening commented Jun 6, 2017

Trying to summarize the status of attach:

Attaching to an already-running process is not yet officially supported. There is a
prototype implementation on Linux, along with fully-implemented attach as
well as full detach through the start/stop API, but we have not had the
resources to finish off a ptrace implementation (#38).

On Windows there are non-trivial technical obstacles (#725).

What was implemented was ptrace injection into a fresh process (#37):
75597c3
However, tests for this are lacking and should be added to make this
a first-class supported feature and avoid bitrotting.

As well as taking over all threads at DR init time (for #722):
5e052f7

Rrecent work has made the start/stop API's attach and thread takeover more
robust, and has added an internally-triggered detach on Linux
(9592116
and several later refactorings and tweaks), which should make it easier to
implement ptrace-based attach and detach.

The remaining work would be extending the ptrace fresh injection to
attaching to an existing process, and perhaps some details around segment
stealing.

@illera88
Copy link
Contributor

When is the attach feature expected to be completed and ready to be used in Linux systems?

Thank you

@derekbruening
Copy link
Contributor Author

If you'd like to contribute to this feature it may be best to first coordinate with:
https://groups.google.com/forum/#!searchin/dynamorio-devs/iannillo%7Csort:relevance/dynamorio-devs/xFejqJpHET4/swlaCygnBwAJ

@M3m3M4n
Copy link
Contributor

M3m3M4n commented Aug 14, 2021

(Xref #5019) Comment added per request

After we ptrace a process, it is stopped with SIGSTOP. This is the point where we inject our code, but...
If it was in middle of a blocking / auto-restarting syscall, kernel will report to the tracer that PC is at the next instruction after syscall, but will set it back to syscall instruction by subtracting PC (PC -= sizeof(syscall)) after continuation.

There are 2 ways which we can handle this:

  • wait: by single-stepping, we can ensure we are done with any syscall before we inject. As the result, if we were in a blocking / auto-restarting syscall, this would hang until the syscall returns, potentially never.
  • inject anyway: then we have to deal with PC subtracting itself every time we issue PTRACE_CONT and the like.

I put "wait" as the default option with wait_syscall=true, this won't break anything.
and wait_syscall=false if user specifies so.

if wait_syscall=false:

  • To handle the case where the tracee was in middle of blocking syscall. Every time we send TRACE_CONT and it's friends from tracer, we must advance PC by size of syscall (2 in X86).
  • But if the user specifies wait_syscall=false and tracee was not in blocking syscall, advancing by 2 will break the code, hence the 2 nops in shellcode run routine and libdynamorio's _start.
  • There is another problem, executing our code this way cause the returning errno set to 512 (ERESTARTSYS), this is kernel-only errno and I have to mask it to EINTR. I'm aware that some blocking syscalls might not return this but most used syscalls like read, write, thread_cond_wait... support EINTR and programmers should expect that. Otherwise, yes, skipping syscall will break the app.
  • Because of those funky behaviours I added this as experimental.
  • A cleaner way to do all this is only modify PC and nops when we detect we were in a syscall by checking previous instruction. However this is not trivial to do in X86 because of instruction size inconsistency.

@derekbruening
Copy link
Contributor Author

Forgot to include the Issue link to here in PR #5019's merge commit bf246bf

Also, that PR forgot to update the release notes list of new features in api/docs/release.dox. That can be added as part of the separate PR that adds a regression test.

@derekbruening
Copy link
Contributor Author

#5054 lists a number of issues to be improved here:

  • set xdi to zero for x86 _start relocation of libdynamorio
  • implement remote memset for .bss zeroing in elf_loader_map_phdrs()
  • don't kill target if attach fails
  • fix crash if no pid passed
  • useful error message on failure b/c of no ptrace permissions
  • add a warning to use -skip_syscall if attach hangs

Tests are also missing.
In adding a simple test using linux.infloop: sometimes the test's mprotect syscall fails but w/ 0 errno
Is this the attach interrupting it but failing to set EINTR or sthg?

Error on mprotect: 0

That failure is not in the DR logs if I use -loglevel 2: so presumably
it's from attach right before DR takes over.
For now I'm working around that to get some test green.

Still TODO:

  • Add a targeted test of -skip_syscall
  • Figure out and fix the mprotect failure
  • Ideally, avoid the need for -skip_syscall

derekbruening added a commit that referenced this issue Nov 28, 2021
Fixes a number of issues with Linux attach:

+ Set xdi to zero for x86 _start relocation of libdynamorio.

+ Implement remote memset for .bss zeroing in elf_loader_map_phdrs(),
  fixing a crash in some builds such as Ubuntu20 release build.

+ Don't kill target if attach fails.

+ Fix crash if no pid passed.

+ Adds a useful error message on failure to look at ptrace permissions.

+ Adds a warning to use -skip_syscall if attach hangs.

+ Adds a test by porting the Windows client.attach test to Linux.
  Disables the mprotect syscall due to weird failures which need to be
  examined.
  Further tests of blocking syscalls and -skip_syscall are needed.

Re-enables the attach help message for drrun and the deployment docs.

Tested release build on Ubuntu20 where the .bss crash reproduced every
run and is now gone.

Tested "ctest --repeat-until-fail 100 -V -R client.attach" on Ubuntu20
and on a Debian-ish system: no failures.

Issue: #38, #5054
Fixes #5054
@derekbruening
Copy link
Contributor Author

derekbruening commented Nov 28, 2021

Also TODO:

  • AArchXX support: the various gen_push_string(), etc. functions in injector.c need to be ported
  • Android support: port linux.infloop for the test; maybe more

derekbruening added a commit that referenced this issue Nov 29, 2021
Fixes a number of issues with Linux attach:

+ Set xdi to zero for x86 _start relocation of libdynamorio.

+ Implement remote memset for .bss zeroing in elf_loader_map_phdrs(),
  fixing a crash in some builds such as Ubuntu20 release build.

+ Don't kill target if attach fails.

+ Fix crash if no pid passed.

+ Adds a useful error message on failure to look at ptrace permissions.

+ Adds a warning to use -skip_syscall if attach hangs.

+ Adds a test by porting the Windows client.attach test to Linux.
  Disables the mprotect syscall due to weird failures which need to be
  examined.
  Further tests of blocking syscalls and -skip_syscall are needed.

Re-enables the attach help message for drrun and the deployment docs.

Tested release build on Ubuntu20 where the .bss crash reproduced every
run and is now gone.

Tested "ctest --repeat-until-fail 100 -V -R client.attach" on Ubuntu20
and on a Debian-ish system: no failures.

Issue: #38, #5054
Fixes #5054
@derekbruening
Copy link
Contributor Author

Assigning temporarily to me as I'm adding AArch64 support

@derekbruening
Copy link
Contributor Author

I have AArch64 working, but as usual AArch32 is much more complex with the arm-vs-thumb transition woes. Plus I'm now blocked on an IT block bug which you'd think would have been hit before? #5459.

Plus for AArch64 attaching to some target servers, SIGUSR2 is blocked: but that's filed separately as #5458.

@derekbruening
Copy link
Contributor Author

Also, for AArch32's client.attach test to infloop, I seem to need to pass -skip_syscall while the other arches don't need it (and deliberately don't pass it as it's considered experimental; though it seems necessary to attach to many apps).

derekbruening added a commit that referenced this issue Apr 12, 2022
Implements missing functionality for ptrace attach on AArch64 and
AArch32: generated code sequences were previously x86-only;
-skip_syscall handling only supported x86; and AArch64 does not
support PTRACE_POKEUSER or PTRACE_PEEKUSER.

For AArch32, Thumb vs Arm mode require multiple steps: clearing LSB to
point at the path used as data via a call; switching to Arm mode for
DR's _start; setting the LSB of the initial app PC.

For AArch32, additionally fixes an encoder error where the opcode is
queried before copying a needs-no-encoding instruction.  This is
required for the instruction used to hold data for injection.

Tweaks the disassembler to leave a level 0 instr alone, again to
better handle the data-only insruction used for injection.

Enables the client.attach test on AArch64 and AArch32.  For AArch32,
it needs -skip_syscall.  Long-term we want that on by default
everywhere but we want explicit tests that hit it on all platforms
first.

Issue: #38
derekbruening added a commit that referenced this issue Apr 13, 2022
Implements missing functionality for ptrace attach on AArch64 and
AArch32: generated code sequences were previously x86-only;
-skip_syscall handling only supported x86; and AArch64 does not
support PTRACE_POKEUSER or PTRACE_PEEKUSER.

For AArch32, Thumb vs Arm mode require multiple steps: clearing LSB to
point at the path used as data via a call; switching to Arm mode for
DR's _start; setting the LSB of the initial app PC.

For AArch32, additionally fixes an encoder error where the opcode is
queried before copying a needs-no-encoding instruction.  This is
required for the instruction used to hold data for injection.

Tweaks the disassembler to leave a level 0 instr alone, again to
better handle the data-only insruction used for injection.

Enables the client.attach test on AArch64 and AArch32.  For AArch32,
it needs -skip_syscall.  Long-term we want that on by default
everywhere but we want explicit tests that hit it on all platforms
first.

Tested manually on an AArch32 machine. Unfortunately the client.attach test
is not trival to set up under QEMU with its multiple command lines and
background processes so that is left as beyond the scope of this PR
and is instead considered part of #4719.

Issue: #38
derekbruening added a commit that referenced this issue Apr 14, 2022
Makes interrupting a blocking syscall on attach the default behavior.
Removes the -skip_syscall drrun/drinject parameter.
Adds a test by adding an optional blocking syscall to infloop.

Issue: #38
@derekbruening
Copy link
Contributor Author

Re: the blocking syscall issues above and -skip_syscall: my plan is to make -skip_syscall on by default since I seem to need it most times that I attach, and it works. What's missing is handling auto-restart syscalls where we need to set the PC back to before the syscall: but we'd need to restore the syscall # which is difficult.

Note that on AArch64 kernels, this syscall issue does not seem to exist: the takeover PC always re-executes the syscall.

@derekbruening derekbruening removed their assignment Apr 15, 2022
derekbruening added a commit that referenced this issue Apr 15, 2022
Makes interrupting a blocking syscall on attach the default behavior.
Removes the -skip_syscall drrun/drinject parameter.
Adds a test by adding an optional blocking syscall to infloop.

Issue: #38
@abhinav92003
Copy link
Contributor

User-requested feature (https://groups.google.com/g/dynamorio-users/c/IgekzFu-Md0): also support adhoc detach.

@derekbruening
Copy link
Contributor Author

Detach is #95 (I think nudge-based detach is all that covers now) and #2644.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants