Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM suite test regressions: api.disA32, api.disT32, many more #3699

Open
derekbruening opened this issue Jun 24, 2019 · 9 comments
Open

ARM suite test regressions: api.disA32, api.disT32, many more #3699

derekbruening opened this issue Jun 24, 2019 · 9 comments

Comments

@derekbruening
Copy link
Contributor

We did not have ARM tests being run for a while. Now that we have a CDash machine back up, I compared its debug run yesterday to the most recent on that same machine name from April 2018:

http://dynamorio.org/CDash/viewTest.php?buildid=35255

Testing started on 2018-04-18 23:37:13
Name	Status	Time	Details	Labels
code_api|client.drmgr-test	Failed	5s 810ms	Completed (Failed)	
code_api|linux.signal_race	Failed	1m 30s 690ms	Completed (Failed)	
code_api|linux.eintr-noinline	Failed	1m 30s 220ms	Completed (Failed)	
code_api|tool.drcachesim.invariants	Failed	1m 30s 410ms	Completed (Failed)	
code_api|tool.histogram.offline	Failed	1m 31s 760ms	Completed (Failed)	
code_api|tool.drcachesim.delay-simple	Failed	8s 340ms	Completed (Failed)

http://dynamorio.org/CDash/viewTest.php?buildid=44194

Testing started on 2019-06-21 23:30:26
Name	Status	Time	Details	Labels
code_api|linux.signal_race	Failed	1m 30s 20ms	Completed (Failed)	
code_api|linux.eintr-noinline	Failed	1m 30s 180ms	Completed (Failed)	
 code_api|linux.sigsuspend	Failed	1m 30s 100ms	Completed (Failed)	
 code_api|tool.drcacheoff.simple	Failed	6s 770ms	Completed (Failed)	
 code_api|tool.drcacheoff.filter	Failed	8s 820ms	Completed (Failed)	
 code_api|tool.drcacheoff.opcode_mix	Failed	7s 620ms	Completed (Failed)	
 code_api|linux.syscall_pwait	Failed	10s 890ms	Completed (Failed)	
 code_api|client.predicate-test	Failed	1s 110ms	Completed (Failed)	
 code_api|client.drreg-test	Failed	1s 520ms	Completed (Failed)	
code_api|client.drmgr-test	Failed	3s 410ms	Completed (Failed)	
 code_api|client.drutil-test	Failed	2s 420ms	Completed (Failed)	
 code_api|sample.opcodes	Failed	510ms	Completed (Failed)	
code_api|tool.drcachesim.delay-simple	Failed	4s 110ms	Completed (Failed)	
 code_api|tool.drcacheoff.multiproc	Failed	12s 390ms	Completed (Failed)	
 code_api|api.disA32	Failed	1s 360ms	Completed (Failed)	
 code_api|api.disT32	Failed	1s 430ms	Completed (Failed)
@AssadHashmi
Copy link
Contributor

I'm assuming these regressions have happened because we don't have AArch32 precommit testing in the same way we have for AArch64?

@Carrotman42
Copy link
Contributor

Carrotman42 commented Jun 25, 2019

Note: in a recent debug run code_api|linux.signal_race did not fail, so it is at least flaky.

In a recent release run we observe code_api|linux.reset failing as well. In addition, code_api|linux.sigsuspend and code_api|linux.signal_race passed in that run.

@derekbruening
Copy link
Contributor Author

I'm assuming these regressions have happened because we don't have AArch32 precommit testing in the same way we have for AArch64?

That's my assumption, but I wouldn't have expected the ARM encoder/decoder tests to have broken. Hopefully it's something simple.

@egrimley-arm
Copy link
Contributor

The log message for c76c485 (Oct 2021) says:

Adds missing required-1 bits in the ARM encoding table entries for
OP_blx, OP_bx, and OP_bxj.  Without the bits, some hardware still
accepts the instructions (which is why we did not notice the problem
before), but they are technically unsound, and QEMU thinks they are
invalid, breaking some of our tests under QEMU.

That change is responsible for most of the api.disA32 failures, and perhaps that change should be reversed. It's my understanding that all hardware will accept those instructions: bits 8-19, which should be 1, will be ignored. Since these are branch instructions DynamoRIO really needs to decode and intercept them, I think. The problem with QEMU is perhaps a bug in QEMU. Perhaps it has even been fixed in the meantime.

The other api.dis failures are due to vadd.f32 and I have a patch for them coming soon, I hope.

@egrimley-arm
Copy link
Contributor

I did a bisection on common.segfault: it seems to have been broken by a8c4f6f, and I can make it pass again on the head by changing RETCODE_SIZE in core/unix/signal_private.h from 16 to 8. However, I've seen unsigned long retcode[4] in a genuine kernel header file so maybe there is something else wrong that is somehow compensated for by the previous definition of RETCODE_SIZE.

@egrimley-arm
Copy link
Contributor

Yes, I did see unsigned long retcode[4] in a genuine kernel header file, but it was part of sigframe, not part of rt_sigframe. There seems to be no retcode in an AArch32 rt_sigframe. I will draft a PR to remove it.

(A spurious retcode of length 8 bytes happens to be harmless because there are a couple of items on the stack beyond the rt_sigframe so copying 8 bytes from after the rt_sigframe does not segfault. Copying any more does segfault, which is particularly confusing in this case as we're already in the process of attempting to handle a segfault.)

@egrimley-arm
Copy link
Contributor

In an arm32v7 Debian 10 Docker container under a 64-bit 5.4.47 kernel on Cortex-A72 hardware with DynamoRIO slightly modified (retcode removed from rt_sigframe) I got 114 out of 298 tests failing. One of the failing tests was drcacheoff.simple so I looked at that one under GDB. The stack trace looked like nonsense but the SIGBUS reported from an instance of ldrexd r0, r1, [r2] looked plausible: r2 contained an odd multiple of 4 but that instruction requires 8-byte alignment.

Perhaps it's obvious in hindsight was was happening but I had to do stuff like insert home-made asm volatile("b.w .\n\t.inst 65") breakpoints into the source to discover that the problem seemed to be one of the two instances (probably the first one) in tracer.cpp of

        placement = dr_global_alloc(MAX_INSTRU_SIZE);
        instru = new (placement) offline_instru_t(

dr_global_alloc returns a value with HEAP_ALIGNMENT, which is just 4 on ARM, but offline_instru_t uses some library lock function that assumes the standard 8-byte alignment and uses ldrexd.

So I changed HEAP_ALIGNMENT to 8, added a couple of ALIGN_FORWARDs in the definition of BLOCK_SIZES, also changed HEADER_SIZE to 8 and replaced STANDARD_HEAP_ALIGNMENT - HEAP_ALIGNMENT with 4 in redirect_malloc and commented out a few ASSERTs, and then I got only 72 out of 298 tests failing, so 42 tests were fixed by that hackery.

(In order to build the tests I also had to add a bogus definition of gettid to burst_syscall_inject.cpp.)

Question for @derekbruening: Is changing HEAP_ALIGNMENT to 8 on ARM the right approach here?

(Since DynamoRIO has been broken on 32-bit ARM hardware since 2019 there's perhaps not a huge amount of interest in it so we probably can't invest a huge amount of effort into it so an easy fix would be preferred!)

egrimley-arm added a commit that referenced this issue Dec 13, 2024
The definition did not agree with the Linux kernel's definition.
There is a retcode field in sigframe, but not in rt_sigframe.

Issue: #3699

Change-Id: Icbdd204a021778e329d5f5831933debda4bd9dd4
egrimley-arm added a commit that referenced this issue Dec 13, 2024
The definition did not agree with the Linux kernel's definition.
There is a retcode field in sigframe, but not in rt_sigframe.

This change made the previously failing common.segfault test pass
in a Debian 10 container running under a 64-bit 5.4.47 kernel
on Cortex-A72 hardware.

Issue: #3699
@derekbruening
Copy link
Contributor Author

Question for @derekbruening: Is changing HEAP_ALIGNMENT to 8 on ARM the right approach here?

Maybe that is the simplest rather than having to find all C++ uses and replacing with __wrap_malloc(). Hopefully the DR allocator code just works if the alignment define is changed...

egrimley-arm added a commit that referenced this issue Dec 16, 2024
Add previously missing asm for 32-bit ARM clone3 system call.

Issue: #3699
egrimley-arm added a commit that referenced this issue Dec 17, 2024
Otherwise "memory leak detected" in api.it test,
since dr_standalone_exit was added by 76def08.

Issue: #3699

Change-Id: Ie7339038e0d3986f92156a9554fb87c819b23678
egrimley-arm added a commit that referenced this issue Dec 17, 2024
Otherwise "memory leak detected" in api.it test,
since dr_standalone_exit was added by 76def08.

Issue: #3699
egrimley-arm added a commit that referenced this issue Dec 17, 2024
dd1589c modified dstack_offs in insert_push_all_registers without
generating an instruction that modifies SP. So add a SUB SP, SP, #4
there, and a corresponding ADD SP, SP, #4 in insert_pop_all_registers.

This made the test client.cleancallparams pass.

Issue: #3699

Change-Id: I25d2a22e7feb7bc1227aeec74847c80754eae254
egrimley-arm added a commit that referenced this issue Dec 18, 2024
…7149)

dd1589c modified dstack_offs in insert_push_all_registers without
generating an instruction that modifies SP. So add a SUB SP, SP, #4
there, and a corresponding ADD SP, SP, #4 in insert_pop_all_registers.

This made the test client.cleancallparams pass.

Issue: #3699
@egrimley-arm
Copy link
Contributor

With 73b1df1 plus a few changes that have not yet been merged, namely:

in an AArch32 Debian 10 container under a 64-bit 5.4.47 Linux kernel running on Cortex-A72 I got 47 tests failed out of 298:

          4 - tool.drcacheoff.raw2trace_unit_tests (Failed)
          5 - tool.scheduler.unit_tests (Child aborted)
          8 - tool.drcacheoff.trace_interval_analysis_unit_tests (Failed)
          9 - tool.drcacheoff.analysis_unit_tests (Timeout)
         20 - code_api|linux.sigaction.native (Failed)
         85 - code_api|linux.sigmask (Failed)
         89 - code_api|linux.fib-conflict (Failed)
         90 - code_api|linux.fib-conflict-early (Failed)
         93 - code_api|linux.vfork (Failed)
         94 - code_api|linux.thread-reset (Failed)
        100 - code_api|pthreads.pthreads_fork_FLAKY (Failed)
        116 - code_api|client.detach_test (Timeout)
        120 - code_api|client.file_io (Failed)
        129 - code_api|client.drmgr-test (Failed)
        136 - code_api|client.drbbdup-drwrap-test (Failed)
        140 - code_api|client.drreg-test (Failed)
        141 - code_api|client.drreg-end-restore (Failed)
        154 - code_api|client.drutil-test (Failed)
        172 - code_api|sample.statecmp (Failed)
        194 - code_api|tool.drcachesim.delay-global (Failed)
        213 - code_api|tool.drcacheoff.raw-zlib (Failed)
        217 - code_api|tool.drcacheoff.multiproc (Timeout)
        225 - code_api|tool.drcacheoff.sysnums (Failed)
        229 - code_api|tool.drcacheoff.max-global (Failed)
        235 - code_api|tool.drcacheoff.windows-invar (Failed)
        253 - code_api|tool.drcacheoff.invariant_checker (Failed)
        254 - code_api|tool.drcacheoff.invariant_checker_pthreads (Failed)
        255 - code_api|tool.histogram.offline (Failed)
        256 - code_api|tool.record_filter (Failed)
        257 - code_api|tool.record_filter_bycore_uni (Failed)
        258 - code_api|tool.record_filter_bycore_multi (Failed)
        259 - code_api|tool.drcachesim.drstatecmp-delay-simple (Failed)
        260 - code_api|tool.drcacheoff.drstatecmp-delay-simple (Failed)
        263 - code_api|tool.drcacheoff.getretaddr_record_replace_retaddr (Failed)
        272 - code_api|client.drwrap-test-detach (Failed)
        274 - code_api|api.disA32 (Failed)
        276 - code_api|api.startstop (Failed)
        277 - code_api|api.detach (Failed)
        278 - code_api|api.detach_spawn_FLAKY (Failed)
        279 - code_api|api.detach_spawn_stress_FLAKY (Failed)
        289 - code_api|api.static_sideline_FLAKY (Failed)
        293 - code_api|api.thread_churn (Failed)
        294 - code_api|tool.drcacheoff.legacy (Failed)
        295 - code_api|tool.drcacheoff.func_view_noret (Failed)
        296 - code_api|tool.drcacheoff.altbindir (Failed)
        297 - code_api|tool.drcacheoff.legacy-int-offs (Failed)
        298 - no_code_api,no_intercept_all_signals|linux.sigaction (Failed)

egrimley-arm added a commit that referenced this issue Jan 7, 2025
Some C++ library functions require 8-byte alignment.
This made the test drcacheoff.simple pass on 32-bit ARM.

Issue: #3699

Change-Id: I507833f3367508fd512ee2d1fa272b04203bfff1
egrimley-arm added a commit that referenced this issue Jan 8, 2025
Some C++ library functions require 8-byte alignment. This made the test
drcacheoff.simple pass on 32-bit ARM.

Issue: #3699
egrimley-arm added a commit that referenced this issue Jan 8, 2025
On ARM/AArch64 not all MRS/MSR instructions operate on aflags,
and on ARM the register source operand of MSR, if there is one,
is the second operand.

This change made the test sample.memval_simple pass on 32-bit ARM.

Issue: #3699

Change-Id: I9a04273ed0507c0f8cccd636fdbc0aff1b5bf36d
egrimley-arm added a commit that referenced this issue Jan 9, 2025
On ARM/AArch64 not all MRS/MSR instructions operate on aflags, and on
ARM the register source operand of MSR, if there is one, is the second
operand.

This change made the test sample.memval_simple pass on 32-bit ARM. (But
client.drreg-test does not yet pass.)

Issue: #3699
egrimley-arm added a commit that referenced this issue Jan 16, 2025
ef4482e caused a few additional failures. In particular, the
child of a clone3 system call started with the register values
shifted: R2 got the value that should be in R1 and so on.
That caused the test linux.clone to fail.

It turns out that some callers of insert_{push,pop}_all_registers
expect that an unpadded priv_mcontext_t will be pushed/popped.
Moving the 4-byte padding to above the struct instead of below it
makes some tests pass but breaks others. So we add some padding
to priv_mcontext_t itself, changing its size from 324 to 328. It
will probably run slightly faster with the struct 8-byte aligned.

Update insert_{push,pop}_all_registers and arm.asm for the new
struct layout, with some additional changes to arm.asm:

+ Delete irrelevant X64 macros.

+ Document the macros more clearly (they can be found with grep).

+ Change SP only at start and end of function (standard practice,
makes debugging easier).

+ Avoid writing below SP (compatible with signal handlers).

Issue: #3699

Change-Id: I4977c5bf42e349d1085dbdecf3428e83a4a2399c
egrimley-arm added a commit that referenced this issue Jan 17, 2025
ef4482e caused a few additional failures. In particular, the child of
a clone3 system call started with the register values shifted: R2 got
the value that should be in R1 and so on. That caused the test
linux.clone to fail.

It turns out that some callers of insert_{push,pop}_all_registers expect
that an unpadded priv_mcontext_t will be pushed/popped. Moving the
4-byte padding to above the struct instead of below it makes some tests
pass but breaks others. So we add some padding in the middle of
priv_mcontext_t, changing its size from 324 to 328. It will probably run
slightly faster with the struct (and particularly the field "simd")
8-byte aligned.

Update insert_{push,pop}_all_registers and arm.asm for the new struct
layout, with some additional changes to arm.asm:

+ Delete irrelevant X64 macros.

+ Document the macros more clearly (they can be found with grep).

+ Change SP only at start and end of function (standard practice, makes
debugging easier).

+ Avoid writing below SP (compatible with signal handlers).

This change breaks compatibility so the version is increased to 11.90.

Issue: #3699
egrimley-arm added a commit that referenced this issue Jan 22, 2025
This makes the test code_api|linux.signal_pre_syscall pass.

Issue: #3699

Change-Id: I3c6f51b36ee68f2e32b931e528c064a859aa52b3
egrimley-arm added a commit that referenced this issue Jan 22, 2025
This makes the test code_api|linux.signal_pre_syscall pass.

Issue: #3699
egrimley-arm added a commit that referenced this issue Jan 23, 2025
Small changes to make these tests pass on 32-bit Arm:

+ code_api|tool.drcacheoff.altbindir

+ code_api|tool.drcacheoff.legacy-int-offs

Issue: #3699

Change-Id: I89035440654c536ecb6d63d64a1c2a7999f34068
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants