-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interoperability with QEMU for arm emulation #4719
Comments
Some problems that have been observed include: A crash in qemu:
An assert in DR:
Another assert in DR, this one using unmodified versions of everything (the above are local qemu environments; not quite sure what all the local changes are):
|
Quick analysis of the final assert above and what happens after it: DR needs to be passed a root dir for the dynamic loader (just like QEMU's -L). But we have a problem: we don't init DR and its options until after we map the app's interpreter. The comment says we want the app mapped, and we want dr_get_proc_address() to work. Hmm. Maybe split up the init so we only initialize heap and options first? Or maybe it would work out to do full init before mapping the interpreter. Hardcoding a fix there to see what's next, DR initializes but then hangs trying to take over a non-app thread: a QEMU-added thread?
|
Hardcoding ignoring the other thread, we then hit a problem where the app's loader can't find libc. But code cache operation seems to work well as it builds 727 blocks:
Running a pure-static app at this point works:
|
I have local fixes for the above issues and now have drreg-test passing under QEMU for AArch64. (Private libs need the path prefix as well, with precedence over native.) It fails for ARM but it seems to be drreg bugs or sthg. So this is looking quite viable: it may not be much further work at all to insert qemu into the test commands and set up GA CI. The bulk of the work is going to be dealing with all the broken tests on ARM after several years of unreliable testing. It is a little slow so we may want to pare down the longer tests but we can deal with that when we hit it. This is all with:
The crash and signal assert listed up top are for a different version. I'll try to dig out the details there: there are layers in between. |
Adds a maximum tries when waiting for a thread to be taken over at initialization time, instead of looping forever. When the max is hit, a fatal error is raised, unless a newly added option -ignore_takeover_timeout is set. The new option is used to ignore QEMU's own thread when running DR under QEMU. QEMU is not fully transparent and unfortunately does not hide its thread from DR in procfs. Manually tested both option settings under QEMU. Regression tests with QEMU are planned for the test suite for ARM. Issue: #4719
Adds a maximum tries when waiting for a thread to be taken over at initialization time, instead of looping forever. When the max is hit, a fatal error is raised, unless a newly added option -ignore_takeover_timeout is set. The new option is used to ignore QEMU's own thread when running DR under QEMU. QEMU is not fully transparent and unfortunately does not hide its thread from DR in procfs. Manually tested both option settings under QEMU. Regression tests with QEMU are planned for the test suite for ARM. Issue: #4719
Adds a new option -xarch_root which sets a path that is prepended to: + The application executable's interpreter, if the original does not exist. + SYS_openat paths, if the original does not exist. + System paths ued for loading private libraries: here the prefix is prepended before checking whether the original exists. Splits dynamorio_app_init() into two pieces in order to have the options set up at the time the loader maps the interpreter, while avoiding ordering problems with the rest of the initialization. The new option also auto-sets -ignore_takeover_timeout for convenience, as that is always needed when running under QEMU. Manually tested in cross-compile AArchXX setups on a Debian system. Test suite integration is forthcoming. Issue: #4719
Fixes a problem with the hardcoded small timeout from PR #4725 by parameterizing the timeout in a new option -takeover_timeout_ms. It is set to a high value by default; the plan is to have -xarch_root set it to a low value for the common QEMU case of running a small test, while still overridable for large apps. Renames -ignore_takeover_timeout to -unsafe_ignore_takeover_timeout to indicate that it can cause problems if actual application threads are left native. Issue: #4719
Fixes a problem with the hardcoded small timeout from PR #4725 by parameterizing the timeout in a new option -takeover_timeout_ms. It is set to a high value by default; the plan is to have -xarch_root set it to a low value for the common QEMU case of running a small test, while still overridable for large apps. Renames -ignore_takeover_timeout to -unsafe_ignore_takeover_timeout to indicate that it can cause problems if actual application threads are left native. Issue: #4719
Adds a new option -xarch_root which sets a path that is prepended to: + The application executable's interpreter, if the original does not exist. + SYS_openat paths, if the original does not exist. + System paths ued for loading private libraries: here the prefix is prepended before checking whether the original exists. The new -xarch_root option also auto-sets -unsafe_ignore_takeover_timeout and sets -takeover_timeout_ms to a low value for convenience, as that is always needed when running under QEMU. If someone runs a very large app under QEMU the timeout can be overridden. Splits dynamorio_app_init() into two pieces in order to have the options set up at the time the loader maps the interpreter, while avoiding ordering problems with the rest of the initialization. Fixes a standalone-mode bug revealed by the init split: a set of dynamo_heap_initialized to false was missing in standalone_exit. Adds a new module load flag MODLOAD_IS_APP and file mapping flag MAP_FILE_APP to avoid MAP_32BIT on the app interpreter when -heap_in_lower_4GB is set, now that the options are parsed before we map the interpreter. Manually tested in cross-compile AArchXX setups on a Debian system. Test suite integration is forthcoming. Issue: #4719
Avoids printing of an internal warning during early initialization for single-bitwidth setups regardless of -stderr_mask by moving options init even earlier. To avoid DR heap init messing up the app's brk setup, moves heap init out of the options init and into the later half. This undoes the early heap init from PR #4726, which is worked around by switching to a stack buffer for -arch_init. This seems safer in any case, delaying heap init and client lib loads until after the app's interpreter is loaded. Issue: #4719
Avoids printing of an internal warning during early initialization for single-bitwidth setups regardless of -stderr_mask by moving options init even earlier. To avoid DR heap init messing up the app's brk setup, moves heap init out of the options init and into the later half. This undoes the early heap init from PR #4726, which is worked around by switching to a stack buffer for -arch_init. This seems safer in any case, delaying heap init and client lib loads until after the app's interpreter is loaded. Moves the 1config file deletion from d_r_config_init() to -config_heap_init(), after any potential reload_dynamorio(). Issue: #4719
Fixes a bug introduced by PR #4729 which swapped a heap buffer for a stack buffer but placed the buffer in a too-deep scope. Manually tested via: $ qemu-aarch64 -L /usr/aarch64-linux-gnu bin64/drrun -xarch_root /usr/aarch64-linux-gnu -- suite/tests/bin/simple_app $ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -xarch_root /usr/arm-linux-gnueabihf -- suite/tests/bin/simple_app Forthcoming test suite support for running under qemu will add CI tests that will avoid such regressions in the future. Issue: #4719
Fixes a bug introduced by PR #4729 which swapped a heap buffer for a stack buffer but placed the buffer in a too-deep scope. Manually tested via: $ qemu-aarch64 -L /usr/aarch64-linux-gnu bin64/drrun -xarch_root /usr/aarch64-linux-gnu -- suite/tests/bin/simple_app $ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -xarch_root /usr/arm-linux-gnueabihf -- suite/tests/bin/simple_app Forthcoming test suite support for running under qemu will add CI tests that will avoid such regressions in the future. Issue: #4719
With a local tree that auto-adds qemu to test command lines, we have: AArch64: 34% tests passed, 156 tests failed out of 235 So about 1/3 pass. A big portion of the failures are drcachesim and other tool tests where the command line setup is not yet there to insert qemu everywhere. Some failures are timeouts where QEMU is just too slow even with a 4x increase in the time allowed. There are also a bunch of what look like internal QEMU crashes: I think the next step should be to set up GA CI under QEMU with a list of those 1/3 tests, to get some regression tests in place. Going through the rest of the tests can then be done incrementally over time. |
When cross-compiling, inserts QEMU commands into each test command line. Increases the test timeouts by 4x to account for emulation overhead. Adds GA CI support by installing QEMU and enabling running tests for the AArchXX cross-compilation jobs. For now, limits the tests to those marked with a new label RUNS_ON_QEMU, which starts out added to the ~1/3 of tests that currently pass. As we get more tests to work we may want to separate the jobs if they take too much time. Issue: #4719
When cross-compiling, inserts QEMU commands into each test command line. Increases the test timeouts by 4x to account for emulation overhead. Adds GA CI support by installing QEMU and enabling running tests for the AArchXX cross-compilation jobs. For now, limits the tests to those marked with a new label RUNS_ON_QEMU, which starts out added to the ~1/3 of tests that currently pass. Splits the aarchxx-cross-compile job into two: aarch64-cross-compile and arm-cross-compile, each now taking ~10 minutes. Issue: #4719
I added instructions for running under QEMU at https://github.com/DynamoRIO/dynamorio/wiki/Test-Suite#testing-aarchxx That link on the new site is: https://dynamorio.org/page_test_suite.html#autotoc_md263 |
Summarizing the status:
We may want to file these in the QEMU tracker. |
Adds a new section on Running Under QEMU. Adds a documented option entry for -xarch_root. Adds a release note on the new support. Issue: #4719
Adds a new section on Running Under QEMU. Adds a documented option entry for -xarch_root. Adds a release note on the new support. Issue: #4719
Adds a new section on Running Under QEMU. Adds a documented option entry for -xarch_root. Adds a release note on the new support. Issue: #4719
This one seems to be that QEMU does not handle the |
QEMU crashes when executing the WFI AArchXX instruction. DR uses WFI in its spinlock loops. We replace WFI with WFE here, which is roughly a superset of WFI and should work similarly for us while not breaking QEMU. This fixes over a dozen tests that previously failed under QEMU with "unhandled CPU exception 0x10001". Here we add 17 tests on AArch64 and 14 tests on AArch32 to the RUNS_ON_QEMU label. Issue: #4719
With the 0x10001 WFI fix, over a dozen ARM tests not on the suite list pass locally, but fail on the GA CI. They fail mostly with asserts, most frequently a memcache.c assert as shown below. If I could easily reproduce locally I would try to fix it to get more ARM testing. For GA CI: we're on Ubuntu 20. Maybe with an Ubuntu 20 VM I can repro.
|
Adds missing required-1 bits in the ARM encoding table entries for OP_blx, OP_bx, and OP_bxj. Without the bits, some hardware still accepts the instructions (which is why we did not notice the problem before), but they are technically unsound, and QEMU thinks they are invalid, breaking some of our tests under QEMU. Tested on QEMU with the forthcoming #2414 drwrap-drreg-test, and directly with several other decoders: Prior encoding for "blx r11": <stdin>:1:1: warning: invalid instruction encoding 0x3b 0x00 0x20 0xe1 ^ llvm-mc: e120003b capstone: e120003b <INVALID: errcode 0> bfd: e120003b ; <UNDEFINED> instruction: 0xe120003b New encoding: $ disasm_a32 e12fff3b llvm-mc: e12fff3b blx r11 capstone: e12fff3b blx r11 bfd: e12fff3b blx fp Setting up more external-decoder testing is beyond the scope of this fix: #1686 covers that. Issue: #4719, #1686, #2414
Adds missing required-1 bits in the ARM encoding table entries for OP_blx, OP_bx, and OP_bxj. Without the bits, some hardware still accepts the instructions (which is why we did not notice the problem before), but they are technically unsound, and QEMU thinks they are invalid, breaking some of our tests under QEMU. Tested on QEMU with the forthcoming #2414 drwrap-drreg-test, and directly with several other decoders: Prior encoding for "blx r11": <stdin>:1:1: warning: invalid instruction encoding 0x3b 0x00 0x20 0xe1 ^ llvm-mc: e120003b capstone: e120003b <INVALID: errcode 0> bfd: e120003b ; <UNDEFINED> instruction: 0xe120003b New encoding: $ disasm_a32 e12fff3b llvm-mc: e12fff3b blx r11 capstone: e12fff3b blx r11 bfd: e12fff3b blx fp Setting up more external-decoder testing is beyond the scope of this fix: #1686 covers that. Issue: #4719, #1686, #2414
Removes a too-early-and-thus-incorrect call to set_pc_mode_in_cpsr() in execute_handler_from_cache() (transfer_from_sig_handler_to_fcache_return() does this for us at the right time). Removes an incorrect call to dr_set_isa_mode from the cpsr in transfer_from_sig_handler_to_fcache_return(): we want to only set the mode from the target, not the interruption point. Works around QEMU bugs with signals 63 and 64 by using 62 instead in the linux.signalNNNN tests. This allows adding them to the list of tests that work under QEMU. Augments the linux.signalNNNN tests to vary whether the main code and the signal handler are arm or thumb, helping to catch and test signal transition issues. Issue: #4719, #5145 Fixes #5145
Removes a too-early-and-thus-incorrect call to set_pc_mode_in_cpsr() in execute_handler_from_cache() (transfer_from_sig_handler_to_fcache_return() does this for us at the right time). Removes an incorrect call to dr_set_isa_mode from the cpsr in transfer_from_sig_handler_to_fcache_return(): we want to only set the mode from the target, not the interruption point. Works around QEMU bugs with signals 63 and 64 by using 62 instead in the linux.signalNNNN tests. This allows adding them to the list of tests that work under QEMU. Augments the linux.signalNNNN tests to vary whether the main code and the signal handler are arm or thumb, helping to catch and test signal transition issues. Issue: #4719, #5233 Fixes #5233
QEMU crashes when executing the WFI AArchXX instruction. DR uses WFI in its spinlock loops. We replace WFI with WFE here, which is roughly a superset of WFI and should work similarly for us while not breaking QEMU. This fixes over a dozen tests that previously failed under QEMU with "unhandled CPU exception 0x10001". Here we add 12 tests on AArch64 and 2 tests on AArch32 to the RUNS_ON_QEMU label. More tests could be added in AArch32 if a memcache assert were fixed. Issue: #4719
For the memcache assert, pasting from #4956 (comment) You can try using tmate to ssh into the runner and get more logs out: https://github.com/marketplace/actions/debugging-with-tmate |
Implements missing functionality for ptrace attach on AArch64 and AArch32: generated code sequences were previously x86-only; -skip_syscall handling only supported x86; and AArch64 does not support PTRACE_POKEUSER or PTRACE_PEEKUSER. For AArch32, Thumb vs Arm mode require multiple steps: clearing LSB to point at the path used as data via a call; switching to Arm mode for DR's _start; setting the LSB of the initial app PC. For AArch32, additionally fixes an encoder error where the opcode is queried before copying a needs-no-encoding instruction. This is required for the instruction used to hold data for injection. Tweaks the disassembler to leave a level 0 instr alone, again to better handle the data-only insruction used for injection. Enables the client.attach test on AArch64 and AArch32. For AArch32, it needs -skip_syscall. Long-term we want that on by default everywhere but we want explicit tests that hit it on all platforms first. Tested manually on an AArch32 machine. Unfortunately the client.attach test is not trival to set up under QEMU with its multiple command lines and background processes so that is left as beyond the scope of this PR and is instead considered part of #4719. Issue: #38
To aid in testing on ARM (and AArch64, for that matter) it would be helpful to be able to run DR under the QEMU emulator. Today that does not work out of the box. It is not clear whether there are only a handful of surface issues, or whether deeper problems will be hit in both DR and QEMU and we would need significant changes to both.
The text was updated successfully, but these errors were encountered: