-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamoRIO fails to run trivial "clone" example on ARM #1936
Comments
Xref #2089 where I'm putting in a safe_read TLS solution for x86. We may want something similar here: a special safe_read that does not need a dcontext (as TLS init queries are done on new threads). See safe_read_tls_magic(). |
It's a long time since I've looked at this, but I'm not sure that safe_read is the right solution here. It would be better to altogether avoid abusing the app's TLS, which I think ought to be possible here, since we have the stolen register on ARM, unlike on Intel. |
But you don't have a stolen register for native threads. That's the main issue, supporting mixed-control models including attach/detach, the start/stop API, native_exec, etc.: you have to run the same code in a native thread that was never under DR control, a thread that used to be but was given free reign to run in a native context, and a thread now under DR control. You have to give up the stolen register for the 2nd case, and you never had it for the 1st. |
I wonder whether for robustness and long-term simplicity we shouldn't just use gettid() and a hash table rather than make fragile and non-portable assumptions about the app's TLS. (Or is that a solution for a different problem?) |
Whether a thread's DR TLS initialized is called in many places and having a system call there is undesirable for performance reasons. Originally there was no system call, but as various complexities crept in one was added. Removing it in favor of the safe_read resulted in 25% (yes, 25%) speedup in bb-building-bound apps, an 80% speedup for debug build -checklevel 0, and a 6x speedup in dr_get_current_drcontext() (see #2089 for details) (and similarly for the DR-internal analogue) -- getting the current dcontext is a very common operation. So we're talking about significant performance impact. Re-architecting how a lot of code works could perhaps change the situation, but right now querying whether TLS is set up is a performance-critical point. There are a number of downsides to the safe-read approach (including a bunch of faults in every single thread on delayed attach, #2270, and others) and I'm hoping there's a better solution, but it is not a simple problem. |
Should it in principle be possible to know from the context whether TLS is set up, without having to query, except at the start of a signal handler in a process which has at least one native thread? (A global flag could alert you to the possible presence of native threads.) By "context" I mean the function you're in and the arguments that were given to it, but I suppose you could look at the stack backtrace (which would be horrible). By "in principle" I mean with changes to the API, if necessary. |
Here's the program. It seems to work natively on several Linux architectures, and under DynamoRIO on Intel, but not under DynamoRIO on ARM, where I got a segfault in is_thread_tls_initialized.
The program puts a garbage value in the child's TLS pointer, which is hardly acceptable in a C program, so imagine this program written in assembler, if you prefer. If you're not using the standard libraries you're presumably free to use the TLS pointer however you like.
The text was updated successfully, but these errors were encountered: