Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the actual written address range in DC ZVA's memory operand #4400

Open
Tracked by #5145
derekbruening opened this issue Aug 11, 2020 · 5 comments
Open
Tracked by #5145

Use the actual written address range in DC ZVA's memory operand #4400

derekbruening opened this issue Aug 11, 2020 · 5 comments

Comments

@derekbruening
Copy link
Contributor

Unlike the other DC cache operations, DC ZVA is not a cache management
operation but more of a regular store instruction. But clients analyzing
it need its actual start address (back-aligned) and size (cache line size).
The easiest thing for a client would be to have that in the IR operand,
instead of the encoded address: though that brings up issues of re-encoding
fidelity (xref #3339).

@abhinav92003
Copy link
Contributor

abhinav92003 commented Sep 2, 2020

Another complexity with the existing handling of DC ZVA:
it is currently counted as a load operation, as specified by the codec here:

1101010100001xxxxxxxxxxxxxxxxxxx sys : sysops memx0

I tried adding another codec rule, specific to DC ZVA, that considers memx0 as a destination instead. Something like
110101010000101101110100001xxxxx sys memx0 : sysops
but the pattern overlaps with the existing pattern for sys

We might have to either:

  1. improve the codec to allow overlapping patterns: in case of ambiguity, give precedence to the first matching one.
  2. split the pattern for sys into multiple mutually exclusive patterns for its variants.(127 variants: https://developer.arm.com/docs/ddi0595/h/aarch64-system-instructions)

@derekbruening
Copy link
Contributor Author

We might have to either:

  1. improve the codec to allow overlapping patterns: in case of ambiguity, give precedence to the first matching one.
  2. split the pattern for sys into multiple mutually exclusive patterns for its variants.(127 variants: https://developer.arm.com/docs/ddi0595/h/aarch64-system-instructions)

Improving the codec is #4393, and splitting up SYS is a key motivator for doing so.

@abhinav92003
Copy link
Contributor

The easiest thing for a client would be to have that in the IR operand,
instead of the encoded address

Just checking my understanding: this IR change would require adding a new member in _opnd_t to denote back-alignment, here:

} value;

and then use this in
drutil_insert_get_mem_addr_arm(void *drcontext, instrlist_t *bb, instr_t *where,
during the address calculation right?

Also,
the aarch64 codec needs to do this back-alignment for not all SYS operations; currently it's needed only for DC ZVA and maybe flush operations. Therefore, it seems #4393 is a blocker for this issue, right? If yes, I can try working on #4393.

@derekbruening
Copy link
Contributor Author

derekbruening commented Sep 3, 2020

No, rather than inventing a new kind of operand, we would simply discard the original address and store a base+disp equal to what is actually referenced, treating the encoded address as just an encoding detail. That is what I wrote: instead of the encoded address. And that is why I mentioned the (minor) downside of re-encoding fidelity (xref #3339) b/c of the lost info on a modified instruction.

Any scheme using the actual linear address range affected in the IR also has the problem that the decoder now has to know the cache line size, bringing up cross-machine and cross-arch decoding consistency issues.

@derekbruening
Copy link
Contributor Author

Any scheme using the actual linear address range affected in the IR also has the problem that the decoder now has to know the cache line size, bringing up cross-machine and cross-arch decoding consistency issues.

I guess we would make this a decoder setting like the ISA mode (dr_get_isa_mode()) and add routines for setting it? The alternative is to leave the original address there as part of some new type of operand, forcing updates to all operand handling code to handle the new type, hoping to defer the cache line size evaluation to instrumentation where we are less likely to support cross-machine operation and where the underlying line size being used is reasonable.

abhinav92003 added a commit that referenced this issue Sep 11, 2020
Add traced target processor's cache line size to offline trace header.

Back-align memory address in DC ZVA during offline trace post-processing, using the cache line size added in header.

Add trace analyzer to existing test for AArch64 flush instrs, and re-purpose it for AArch64 SYS operations in general.

Issue: #4400
abhinav92003 added a commit that referenced this issue Sep 16, 2020
…aces (#4444)

Fixes alignment, size and memref type for address range zeroed using DC ZVA operation on AArch64.

Adds traced processor's cache line size to header of offline raw traces. Using this, during offline trace post-processing, back-align the memory address in DC ZVA and mark the operation as a cache line sized store. This is a workaround for offline traces, and doesn't help in correctness of DC ZVA representation in online traces.

Adds the cache line size as a new marker type in final offline and online traces. For old raw traces without cache line size in header, the present processor's cache line size is used as default during raw2trace.

Repurposes burst_flush_aarch64 to test AArch64 SYS operations in general, and renames it to burst_aarch64_sys. Adds an offline trace analyser test to verify DC ZVA fixes and new marker.

Adds new check to trace invariant test that verifies presence of cache line size marker in online traces.

Updates offline trace for tool.drcacheoff.altbindir test to add cache line size in header.

Issue: #4400
derekbruening pushed a commit that referenced this issue Sep 29, 2020
…aces (#4444)

Fixes alignment, size and memref type for address range zeroed using DC ZVA operation on AArch64.

Adds traced processor's cache line size to header of offline raw traces. Using this, during offline trace post-processing, back-align the memory address in DC ZVA and mark the operation as a cache line sized store. This is a workaround for offline traces, and doesn't help in correctness of DC ZVA representation in online traces.

Adds the cache line size as a new marker type in final offline and online traces. For old raw traces without cache line size in header, the present processor's cache line size is used as default during raw2trace.

Repurposes burst_flush_aarch64 to test AArch64 SYS operations in general, and renames it to burst_aarch64_sys. Adds an offline trace analyser test to verify DC ZVA fixes and new marker.

Adds new check to trace invariant test that verifies presence of cache line size marker in online traces.

Updates offline trace for tool.drcacheoff.altbindir test to add cache line size in header.

Issue: #4400
AssadHashmi added a commit that referenced this issue Dec 1, 2021
This patch splits out DC and IC instruction aliases from
the SYS instruction and makes DC ZVA a store instruction.

Issues: #4400, #4382, #4329, #2626
AssadHashmi added a commit that referenced this issue Dec 16, 2021
This patch splits out DC and IC instruction aliases from
the SYS instruction:
DC <dc_op>, <Xt>
IC <ic_op>{, <Xt>}

Where <dc_op> is one of:
IVAC ISW CSW CISW ZVA CVAC CVAU CIVAC
and <ic_op> is one of:
IALLUIS IALLU IVAU

Issues: #4400, #2626
AssadHashmi added a commit that referenced this issue Dec 22, 2021
Data cache operations like DC ZVA should handle
addresses as regular store instructions so that
clients can analyse runs based on actual start
address and cache line size.

Issues: #4400, #2626
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants