Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interoperability with QEMU for arm emulation #4719

Open
derekbruening opened this issue Feb 5, 2021 · 10 comments
Open

Add interoperability with QEMU for arm emulation #4719

derekbruening opened this issue Feb 5, 2021 · 10 comments
Assignees

Comments

@derekbruening
Copy link
Contributor

To aid in testing on ARM (and AArch64, for that matter) it would be helpful to be able to run DR under the QEMU emulator. Today that does not work out of the box. It is not clear whether there are only a handful of surface issues, or whether deeper problems will be hit in both DR and QEMU and we would need significant changes to both.

@derekbruening
Copy link
Contributor Author

Some problems that have been observed include:

A crash in qemu:

qemu: uncaught target signal 11 (Segmentation fault) - core dumped 

An assert in DR:

<Application xxx (zzz).  Internal Error: DynamoRIO debug check failure: core/unix/signal.c:1469 rc == 0 IF_VMX86(|| (sig >= 63 && rc == -EINVAL))

Another assert in DR, this one using unmodified versions of everything (the above are local qemu environments; not quite sure what all the local changes are):

$ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -- suite/tests/bin/simple_app
<Application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/simple_app (4017678) DynamoRIO usage error : Failed to read ELF interpreter headers.>
<Usage error: Failed to read ELF interpreter headers. (/home/bruening/dr/git/src/core/unix/loader.c, line 2000)
version 8.0.18663, custom build
-early_inject -emulate_brk 
/home/bruening/dr/git/build_a32_dbg_tests/lib32/debug/libdynamorio.so=0x71000000>

@derekbruening derekbruening self-assigned this Feb 5, 2021
@derekbruening
Copy link
Contributor Author

Quick analysis of the final assert above and what happens after it:

DR needs to be passed a root dir for the dynamic loader (just like QEMU's -L). But we have a problem: we don't init DR and its options until after we map the app's interpreter. The comment says we want the app mapped, and we want dr_get_proc_address() to work. Hmm. Maybe split up the init so we only initialize heap and options first? Or maybe it would work out to do full init before mapping the interpreter.

Hardcoding a fix there to see what's next, DR initializes but then hangs trying to take over a non-app thread: a QEMU-added thread?

<Attached to 1/2 threads in application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/simple_app (4167127)>

@derekbruening
Copy link
Contributor Author

Hardcoding ignoring the other thread, we then hit a problem where the app's loader can't find libc. But code cache operation seems to work well as it builds 727 blocks:

$ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -rstats_to_stderr -- suite/tests/bin/simple_app
<Starting application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/simple_app (4180146)>
<Initial options = -no_dynamic_options -rstats_to_stderr -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/home/bruening/dr/git/build_a32_dbg_tests/lib32/debug/libdynamorio.so' 0x71024300
>
/home/bruening/dr/git/build_a32_dbg_tests/bin32/../lib32/debug/libdynamorio.so: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory
<Stopping application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/simple_app (4180146)>
DynamoRIO statistics:
              Peak threads under DynamoRIO control :        1
                              Threads ever created :        1
                                 System calls, pre :      139
                                System calls, post :      137
                                 Application mmaps :        1
                   Basic block fragments generated :      727
             Peak fcache combined capacity (bytes) :    61440
                    Peak fcache units on live list :        2
                    Peak fcache units on free list :        2
                Peak special heap capacity (bytes) :     4096
                      Peak heap units on live list :        8
                      Peak heap units on free list :        5
                       Peak stack capacity (bytes) :   147456
                        Peak heap capacity (bytes) :   221184
                 Peak total memory from OS (bytes) :   696320
              Peak vmm blocks for unreachable heap :       91
                         Peak vmm blocks for stack :       42
      Peak vmm blocks for unreachable special heap :        4
      Peak vmm blocks for unreachable special mmap :        7
                Peak vmm blocks for reachable heap :        1
                         Peak vmm blocks for cache :       19
        Peak vmm blocks for reachable special mmap :        7
            Peak vmm virtual memory in use (bytes) :   700416

Running a pure-static app at this point works:

$ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -rstats_to_stderr -- suite/tests/bin/common.allasm_arm
<Starting application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/common.allasm_arm (4188999)>
<Initial options = -no_dynamic_options -rstats_to_stderr -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/home/bruening/dr/git/build_a32_dbg_tests/lib32/debug/libdynamorio.so' 0x71024300
>
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
All done
<Stopping application /home/bruening/dr/git/build_a32_dbg_tests/suite/tests/bin/common.allasm_arm (4188999)>
DynamoRIO statistics:
              Peak threads under DynamoRIO control :        1
                              Threads ever created :        1
                                 System calls, pre :        2
                                System calls, post :        1
                   Basic block fragments generated :       22
             Peak fcache combined capacity (bytes) :     8192
                    Peak fcache units on live list :        2
                    Peak fcache units on free list :        2
                Peak special heap capacity (bytes) :     4096
                      Peak heap units on live list :        7
                      Peak heap units on free list :        5
                       Peak stack capacity (bytes) :   147456
                        Peak heap capacity (bytes) :   155648
                 Peak total memory from OS (bytes) :   638976
              Peak vmm blocks for unreachable heap :       77
                         Peak vmm blocks for stack :       42
      Peak vmm blocks for unreachable special heap :        4
      Peak vmm blocks for unreachable special mmap :        7
                Peak vmm blocks for reachable heap :        1
                         Peak vmm blocks for cache :       19
        Peak vmm blocks for reachable special mmap :        7
            Peak vmm virtual memory in use (bytes) :   643072

@derekbruening
Copy link
Contributor Author

I have local fixes for the above issues and now have drreg-test passing under QEMU for AArch64. (Private libs need the path prefix as well, with precedence over native.) It fails for ARM but it seems to be drreg bugs or sthg.

So this is looking quite viable: it may not be much further work at all to insert qemu into the test commands and set up GA CI. The bulk of the work is going to be dealing with all the broken tests on ARM after several years of unreliable testing.

It is a little slow so we may want to pare down the longer tests but we can deal with that when we hit it.

This is all with:

$ qemu-arm --version
qemu-arm version 5.2.0 (Debian 1:5.2+dfsg-3)

The crash and signal assert listed up top are for a different version. I'll try to dig out the details there: there are layers in between.

derekbruening added a commit that referenced this issue Feb 9, 2021
Adds a maximum tries when waiting for a thread to be taken over at
initialization time, instead of looping forever.  When the max is hit,
a fatal error is raised, unless a newly added option
-ignore_takeover_timeout is set.

The new option is used to ignore QEMU's own thread when running DR
under QEMU.  QEMU is not fully transparent and unfortunately does not
hide its thread from DR in procfs.

Manually tested both option settings under QEMU.  Regression tests
with QEMU are planned for the test suite for ARM.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 9, 2021
Adds a maximum tries when waiting for a thread to be taken over at
initialization time, instead of looping forever.  When the max is hit,
a fatal error is raised, unless a newly added option
-ignore_takeover_timeout is set.

The new option is used to ignore QEMU's own thread when running DR
under QEMU.  QEMU is not fully transparent and unfortunately does not
hide its thread from DR in procfs.

Manually tested both option settings under QEMU.  Regression tests
with QEMU are planned for the test suite for ARM.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 9, 2021
Adds a new option -xarch_root which sets a path that is prepended to:
+ The application executable's interpreter, if the original does not exist.
+ SYS_openat paths, if the original does not exist.
+ System paths ued for loading private libraries: here the prefix is prepended
  before checking whether the original exists.

Splits dynamorio_app_init() into two pieces in order to have the
options set up at the time the loader maps the interpreter, while
avoiding ordering problems with the rest of the initialization.

The new option also auto-sets -ignore_takeover_timeout for
convenience, as that is always needed when running under QEMU.

Manually tested in cross-compile AArchXX setups on a Debian system.
Test suite integration is forthcoming.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 10, 2021
Fixes a problem with the hardcoded small timeout from PR #4725 by
parameterizing the timeout in a new option -takeover_timeout_ms.  It
is set to a high value by default; the plan is to have -xarch_root set
it to a low value for the common QEMU case of running a small test,
while still overridable for large apps.

Renames -ignore_takeover_timeout to -unsafe_ignore_takeover_timeout to
indicate that it can cause problems if actual application threads are
left native.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 10, 2021
Fixes a problem with the hardcoded small timeout from PR #4725 by
parameterizing the timeout in a new option -takeover_timeout_ms.  It
is set to a high value by default; the plan is to have -xarch_root set
it to a low value for the common QEMU case of running a small test,
while still overridable for large apps.

Renames -ignore_takeover_timeout to -unsafe_ignore_takeover_timeout to
indicate that it can cause problems if actual application threads are
left native.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 10, 2021
Adds a new option -xarch_root which sets a path that is prepended to:
+ The application executable's interpreter, if the original does not exist.
+ SYS_openat paths, if the original does not exist.
+ System paths ued for loading private libraries: here the prefix is prepended
  before checking whether the original exists.

The new -xarch_root option also auto-sets -unsafe_ignore_takeover_timeout
and sets -takeover_timeout_ms to a low value for convenience, as that
is always needed when running under QEMU.  If someone runs a very
large app under QEMU the timeout can be overridden.

Splits dynamorio_app_init() into two pieces in order to have the
options set up at the time the loader maps the interpreter, while
avoiding ordering problems with the rest of the initialization.

Fixes a standalone-mode bug revealed by the init split: a set of
dynamo_heap_initialized to false was missing in standalone_exit.

Adds a new module load flag MODLOAD_IS_APP and file mapping flag
MAP_FILE_APP to avoid MAP_32BIT on the app interpreter when
-heap_in_lower_4GB is set, now that the options are parsed before we
map the interpreter.

Manually tested in cross-compile AArchXX setups on a Debian system.
Test suite integration is forthcoming.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 11, 2021
Avoids printing of an internal warning during early initialization for
single-bitwidth setups regardless of -stderr_mask by moving options
init even earlier.

To avoid DR heap init messing up the app's brk setup, moves heap init
out of the options init and into the later half.  This undoes the
early heap init from PR #4726, which is worked around by switching to
a stack buffer for -arch_init.  This seems safer in any case, delaying
heap init and client lib loads until after the app's interpreter is
loaded.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 11, 2021
Avoids printing of an internal warning during early initialization for
single-bitwidth setups regardless of -stderr_mask by moving options
init even earlier.

To avoid DR heap init messing up the app's brk setup, moves heap init
out of the options init and into the later half.  This undoes the
early heap init from PR #4726, which is worked around by switching to
a stack buffer for -arch_init.  This seems safer in any case, delaying
heap init and client lib loads until after the app's interpreter is
loaded.

Moves the 1config file deletion from d_r_config_init() to -config_heap_init(),
after any potential reload_dynamorio().

Issue: #4719
derekbruening added a commit that referenced this issue Feb 19, 2021
Fixes a bug introduced by PR #4729 which swapped a heap buffer for a
stack buffer but placed the buffer in a too-deep scope.

Manually tested via:
$ qemu-aarch64 -L /usr/aarch64-linux-gnu bin64/drrun -xarch_root /usr/aarch64-linux-gnu -- suite/tests/bin/simple_app
$ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -xarch_root /usr/arm-linux-gnueabihf -- suite/tests/bin/simple_app

Forthcoming test suite support for running under qemu will add CI
tests that will avoid such regressions in the future.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 19, 2021
Fixes a bug introduced by PR #4729 which swapped a heap buffer for a
stack buffer but placed the buffer in a too-deep scope.

Manually tested via:
$ qemu-aarch64 -L /usr/aarch64-linux-gnu bin64/drrun -xarch_root /usr/aarch64-linux-gnu -- suite/tests/bin/simple_app
$ qemu-arm -L /usr/arm-linux-gnueabihf bin32/drrun -xarch_root /usr/arm-linux-gnueabihf -- suite/tests/bin/simple_app

Forthcoming test suite support for running under qemu will add CI
tests that will avoid such regressions in the future.

Issue: #4719
@derekbruening
Copy link
Contributor Author

With a local tree that auto-adds qemu to test command lines, we have:

AArch64: 34% tests passed, 156 tests failed out of 235
arm: 32% tests passed, 142 tests failed out of 210

So about 1/3 pass. A big portion of the failures are drcachesim and other tool tests where the command line setup is not yet there to insert qemu everywhere. Some failures are timeouts where QEMU is just too slow even with a 4x increase in the time allowed. There are also a bunch of what look like internal QEMU crashes: qemu: unhandled CPU exception 0x10001 - aborting. And the rest may be DR bugs or QEMU bugs or who knows: it will take some effort.

I think the next step should be to set up GA CI under QEMU with a list of those 1/3 tests, to get some regression tests in place. Going through the rest of the tests can then be done incrementally over time.

derekbruening added a commit that referenced this issue Feb 19, 2021
When cross-compiling, inserts QEMU commands into each test command line.
Increases the test timeouts by 4x to account for emulation overhead.

Adds GA CI support by installing QEMU and enabling running tests for
the AArchXX cross-compilation jobs.  For now, limits the tests to
those marked with a new label RUNS_ON_QEMU, which starts out added to
the ~1/3 of tests that currently pass.  As we get more tests to work
we may want to separate the jobs if they take too much time.

Issue: #4719
derekbruening added a commit that referenced this issue Feb 23, 2021
When cross-compiling, inserts QEMU commands into each test command line.
Increases the test timeouts by 4x to account for emulation overhead.

Adds GA CI support by installing QEMU and enabling running tests for
the AArchXX cross-compilation jobs.  For now, limits the tests to
those marked with a new label RUNS_ON_QEMU, which starts out added to
the ~1/3 of tests that currently pass.

Splits the aarchxx-cross-compile job into two: aarch64-cross-compile and
arm-cross-compile, each now taking ~10 minutes.

Issue: #4719
@derekbruening
Copy link
Contributor Author

derekbruening commented Feb 23, 2021

I added instructions for running under QEMU at https://github.com/DynamoRIO/dynamorio/wiki/Test-Suite#testing-aarchxx

That link on the new site is: https://dynamorio.org/page_test_suite.html#autotoc_md263

@derekbruening
Copy link
Contributor Author

derekbruening commented Mar 18, 2021

Summarizing the status:

  • We are able to successfully run 1/3 of our tests under QEMU for both AArch64 and ARM and we are doing so on GA CI.

  • For the failing tests: as noted above, some are missing infrastructure to set up library paths. But a number are hitting what look like internal QEMU bugs, of two varieties:

  1. QEMU fails with unhandled CPU exception 0x10001
  2. QEMU fails to handle signals 63 and 64, which results in many of our signal tests failing

We may want to file these in the QEMU tracker.

derekbruening added a commit that referenced this issue Mar 21, 2021
Adds a new section on Running Under QEMU.
Adds a documented option entry for -xarch_root.
Adds a release note on the new support.

Issue: #4719
derekbruening added a commit that referenced this issue Mar 22, 2021
Adds a new section on Running Under QEMU.
Adds a documented option entry for -xarch_root.
Adds a release note on the new support.

Issue: #4719
sapostolakis pushed a commit that referenced this issue Mar 22, 2021
Adds a new section on Running Under QEMU.
Adds a documented option entry for -xarch_root.
Adds a release note on the new support.

Issue: #4719
@derekbruening
Copy link
Contributor Author

  1. QEMU fails with unhandled CPU exception 0x10001

This one seems to be that QEMU does not handle the WFI opcode (?!). DR uses that in spinlocks. Passing -spinlock_count_on_SMP 0 is a workaround. We could try adding that to some tests when using QEMU to enable more tests.

derekbruening added a commit that referenced this issue Jun 18, 2021
QEMU crashes when executing the WFI AArchXX instruction.
DR uses WFI in its spinlock loops.
We replace WFI with WFE here, which is roughly a superset
of WFI and should work similarly for us while not
breaking QEMU.

This fixes over a dozen tests that previously failed under QEMU with
"unhandled CPU exception 0x10001".  Here we add 17 tests on AArch64
and 14 tests on AArch32 to the RUNS_ON_QEMU label.

Issue: #4719
@derekbruening
Copy link
Contributor Author

With the 0x10001 WFI fix, over a dozen ARM tests not on the suite list pass locally, but fail on the GA CI. They fail mostly with asserts, most frequently a memcache.c assert as shown below. If I could easily reproduce locally I would try to fix it to get more ARM testing. For GA CI: we're on Ubuntu 20. Maybe with an Ubuntu 20 VM I can repro.

$ egrep '\*Failed|Internal E' 'arm-cross-compile/6_Run Suite.txt'
2021-06-18T21:52:47.7467537Z 21: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/lib32/debug/libdynamorio.so (13622).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/os.c:9368 iter->vm_start == executable_start
2021-06-18T21:52:47.7475281Z 10/83 Test  #21: code_api|linux.execve-rec .......................***Failed  Required regular expression not found. Regex=[^under DynamoRIO
2021-06-18T21:52:47.7490146Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/lib32/debug/libdynamorio.so (13622).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/os.c:9368 iter->vm_start == executable_start
2021-06-18T21:52:47.7539416Z 16/83 Test  #27: code_api|linux.prctl ............................***Failed  Required regular expression not found. Regex=[^basename argv\[0\]: linux\.prctl
2021-06-18T21:54:04.7387119Z 65: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.eintr (13785).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7393515Z 32/83 Test  #65: code_api|linux.eintr ............................***Failed  Required regular expression not found. Regex=[^sending SIGURG
2021-06-18T21:54:04.7397267Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.eintr (13785).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7409629Z 66: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.eintr (13791).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7418416Z 33/83 Test  #66: code_api|linux.eintr-noinline ...................***Failed  Required regular expression not found. Regex=[^sending SIGURG
2021-06-18T21:54:04.7422933Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.eintr (13791).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7435522Z 67: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.sigsuspend (13799).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:04.7483065Z 34/83 Test  #67: code_api|linux.sigsuspend .......................***Failed  Required regular expression not found. Regex=[^sending SIGUSR1
2021-06-18T21:54:04.7487322Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.sigsuspend (13799).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:20.9363858Z 68: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.signest (13806).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:20.9371414Z 35/83 Test  #68: code_api|linux.signest ..........................***Failed  Required regular expression not found. Regex=[^sending 2 signals
2021-06-18T21:54:20.9375904Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/linux.signest (13806).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3834114Z 79: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.pthreads (13852).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3845335Z 42/83 Test  #79: code_api|pthreads.pthreads ......................***Failed  Required regular expression not found. Regex=[^Estimation of pi is 3\.142425985001098
2021-06-18T21:54:31.3848514Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.pthreads (13852).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3863909Z 80: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.pthreads_exit (13857).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3875897Z 43/83 Test  #80: code_api|pthreads.pthreads_exit .................***Failed  Required regular expression not found. Regex=[^$
2021-06-18T21:54:31.3879396Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.pthreads_exit (13857).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3895152Z 81: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.ptsig (13866).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:54:31.3903603Z 44/83 Test  #81: code_api|pthreads.ptsig .........................***Failed  Required regular expression not found. Regex=[^Estimation of pi is 3\.142425985001098
2021-06-18T21:54:31.3906091Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/pthreads.ptsig (13866).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:55:11.0091145Z 109: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/client.process-id (13966).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:55:11.0102756Z 61/83 Test #109: code_api|client.process-id ......................***Failed  Required regular expression not found. Regex=[^thread exit: different process id
2021-06-18T21:55:11.0108001Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/client.process-id (13966).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:55:11.0163180Z 62/83 Test #112: code_api|client.drreg-flow ......................***Failed  Required regular expression not found. Regex=[^Hello, world!
2021-06-18T21:55:11.0176672Z 63/83 Test #113: code_api|client.drreg-cross .....................***Failed  Required regular expression not found. Regex=[^Hello, world!
2021-06-18T21:55:29.8577245Z 123: <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/client.stolen-reg (14020).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found
2021-06-18T21:55:29.8587509Z 69/83 Test #123: code_api|client.stolen-reg ......................***Failed  Required regular expression not found. Regex=[^Got SIGSEGV
2021-06-18T21:55:29.8598791Z <Application /home/runner/work/dynamorio/dynamorio/build_arm-debug-internal/suite/tests/bin/client.stolen-reg (14020).  Internal Error: DynamoRIO debug check failure: /home/runner/work/dynamorio/dynamorio/core/unix/memcache.c:382 found

derekbruening added a commit that referenced this issue Oct 14, 2021
Adds missing required-1 bits in the ARM encoding table entries for
OP_blx, OP_bx, and OP_bxj.  Without the bits, some hardware still
accepts the instructions (which is why we did not notice the problem
before), but they are technically unsound, and QEMU thinks they are
invalid, breaking some of our tests under QEMU.

Tested on QEMU with the forthcoming #2414 drwrap-drreg-test,
and directly with several other decoders:
  Prior encoding for "blx r11":
    <stdin>:1:1: warning: invalid instruction encoding
    0x3b 0x00 0x20 0xe1
    ^
    llvm-mc:   e120003b
    capstone:  e120003b <INVALID: errcode 0>
    bfd:       e120003b ; <UNDEFINED> instruction: 0xe120003b
  New encoding:
    $ disasm_a32 e12fff3b
    llvm-mc:   e12fff3b blx r11
    capstone:  e12fff3b blx r11
    bfd:       e12fff3b blx fp

Setting up more external-decoder testing is beyond the scope of this
fix: #1686 covers that.

Issue: #4719, #1686, #2414
derekbruening added a commit that referenced this issue Oct 15, 2021
Adds missing required-1 bits in the ARM encoding table entries for
OP_blx, OP_bx, and OP_bxj.  Without the bits, some hardware still
accepts the instructions (which is why we did not notice the problem
before), but they are technically unsound, and QEMU thinks they are
invalid, breaking some of our tests under QEMU.

Tested on QEMU with the forthcoming #2414 drwrap-drreg-test,
and directly with several other decoders:
  Prior encoding for "blx r11":
    <stdin>:1:1: warning: invalid instruction encoding
    0x3b 0x00 0x20 0xe1
    ^
    llvm-mc:   e120003b
    capstone:  e120003b <INVALID: errcode 0>
    bfd:       e120003b ; <UNDEFINED> instruction: 0xe120003b
  New encoding:
    $ disasm_a32 e12fff3b
    llvm-mc:   e12fff3b blx r11
    capstone:  e12fff3b blx r11
    bfd:       e12fff3b blx fp

Setting up more external-decoder testing is beyond the scope of this
fix: #1686 covers that.

Issue: #4719, #1686, #2414
derekbruening added a commit that referenced this issue Dec 6, 2021
Removes a too-early-and-thus-incorrect call to set_pc_mode_in_cpsr()
in execute_handler_from_cache()
(transfer_from_sig_handler_to_fcache_return() does this for us at the
right time).

Removes an incorrect call to dr_set_isa_mode from the cpsr in
transfer_from_sig_handler_to_fcache_return(): we want to only set the
mode from the target, not the interruption point.

Works around QEMU bugs with signals 63 and 64 by using 62 instead in
the linux.signalNNNN tests.  This allows adding them to the list of
tests that work under QEMU.

Augments the linux.signalNNNN tests to vary whether the main code and
the signal handler are arm or thumb, helping to catch and test signal
transition issues.

Issue: #4719, #5145
Fixes #5145
derekbruening added a commit that referenced this issue Dec 6, 2021
Removes a too-early-and-thus-incorrect call to set_pc_mode_in_cpsr()
in execute_handler_from_cache()
(transfer_from_sig_handler_to_fcache_return() does this for us at the
right time).

Removes an incorrect call to dr_set_isa_mode from the cpsr in
transfer_from_sig_handler_to_fcache_return(): we want to only set the
mode from the target, not the interruption point.

Works around QEMU bugs with signals 63 and 64 by using 62 instead in
the linux.signalNNNN tests.  This allows adding them to the list of
tests that work under QEMU.

Augments the linux.signalNNNN tests to vary whether the main code and
the signal handler are arm or thumb, helping to catch and test signal
transition issues.

Issue: #4719, #5233
Fixes #5233
derekbruening added a commit that referenced this issue Jan 18, 2022
QEMU crashes when executing the WFI AArchXX instruction.
DR uses WFI in its spinlock loops.
We replace WFI with WFE here, which is roughly a superset
of WFI and should work similarly for us while not
breaking QEMU.

This fixes over a dozen tests that previously failed under QEMU with
"unhandled CPU exception 0x10001".  Here we add 12 tests on AArch64
and 2 tests on AArch32 to the RUNS_ON_QEMU label.  More tests
could be added in AArch32 if a memcache assert were fixed.

Issue: #4719
@derekbruening
Copy link
Contributor Author

For the memcache assert, pasting from #4956 (comment)

You can try using tmate to ssh into the runner and get more logs out: https://github.com/marketplace/actions/debugging-with-tmate

derekbruening added a commit that referenced this issue Apr 13, 2022
Implements missing functionality for ptrace attach on AArch64 and
AArch32: generated code sequences were previously x86-only;
-skip_syscall handling only supported x86; and AArch64 does not
support PTRACE_POKEUSER or PTRACE_PEEKUSER.

For AArch32, Thumb vs Arm mode require multiple steps: clearing LSB to
point at the path used as data via a call; switching to Arm mode for
DR's _start; setting the LSB of the initial app PC.

For AArch32, additionally fixes an encoder error where the opcode is
queried before copying a needs-no-encoding instruction.  This is
required for the instruction used to hold data for injection.

Tweaks the disassembler to leave a level 0 instr alone, again to
better handle the data-only insruction used for injection.

Enables the client.attach test on AArch64 and AArch32.  For AArch32,
it needs -skip_syscall.  Long-term we want that on by default
everywhere but we want explicit tests that hit it on all platforms
first.

Tested manually on an AArch32 machine. Unfortunately the client.attach test
is not trival to set up under QEMU with its multiple command lines and
background processes so that is left as beyond the scope of this PR
and is instead considered part of #4719.

Issue: #38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant