Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use static AArch64 decoder tables. #633

Merged
merged 4 commits into from
Sep 4, 2019
Merged

Conversation

blue42u
Copy link
Contributor

@blue42u blue42u commented Aug 14, 2019

Fixes #630, reduces compile time (enough to notice), ~7.8MiB smaller binary, ~10MiB less heap usage. Performance effects not well known, but for HPCStruct ~1.2s is saved on a smaller input.

The previous implementation used a lazy initialization scheme to generate the various tables used for arm64 decoding. This PR replaces all that with simple const tables and switch statements, which the compiler will usually reduce to a blob in .data (or .text).

For the small(-ish) lookup tables for the state machine in main_decoder_table, a linear scan is employed. Any performance drop is likely caused by this in some way.

Apologies for the unreadable diffs, the only change in aarch_opcode_tables.C that isn't noted somewhere else is the removal of duplicate keys in sysRegMap (switch wouldn't compile otherwise). I used the last one of every duplicate, hopefully that reflects what the std::map would have done.

Other improvements include faster compile time (for affected files),
~1.2s less lazy init time, ~7.8MiB smaller binary, and ~10MiB less heapusage.
@hainest hainest self-assigned this Aug 26, 2019
@hainest hainest added the bug label Aug 26, 2019
@hainest
Copy link
Contributor

hainest commented Aug 26, 2019

There are a few places where we can clean the code up even yet more without relying on the compiler to convert the switch statements into indexed lookups. I'll post a code review tomorrow.

Apologies for the unreadable diffs, the only change in aarch_opcode_tables.C that isn't noted somewhere else is the removal of duplicate keys in sysRegMap (switch wouldn't compile otherwise). I used the last one of every duplicate, hopefully that reflects what the std::map would have done.

Having duplicate instruction entries in the original code is very bizarre. I think we should take a deeper look at that before committing here- if even just to assure ourselves we didn't break anything. @mxz297 @sashanicolas could you take a look?

@mxz297
Copy link
Member

mxz297 commented Aug 28, 2019

Many duplicated entries were introduced in this commit (https://github.com/dyninst/dyninst/pull/633/files#diff-0c178a3fd0c3ed4506eda2e5532af3c0).

I believe some of the duplicated entries are wrong. For example, in the existing code

sysRegMap[0x8000] = aarch64::dbgwvr0_el1;
sysRegMap[0x8000] = aarch64::dbgbvr0_el1;
sysRegMap[0x8000] = aarch64::dbgbcr0_el1;
sysRegMap[0x8000] = aarch64::dbgwcr0_el1;

These are four different system registers, so at least three of the entries were wrong. It is not directly clear to me that where 0x8000 comes from based on the ARM manual.

@blue42u
Copy link
Contributor Author

blue42u commented Aug 28, 2019

The key for sysRegMap is a clever bitmask composed at

unsigned int systemRegEncoding =
(op0Field << 14) | (op1Field << 11) | (crnField << 7) | (crmField << 3) | op2Field;

So 0x8000 would have op0Field set to 2 and all others set to 0. I don't know what that means for ARM, but hopefully it helps a little.

@mxz297
Copy link
Member

mxz297 commented Aug 28, 2019

Still, regardless of how the key is used, multiple values associated with the same key in std::map means only one value will ever be used.

While duplicated entries are bad, I doubt they will cause actual issues because system registers are rarely used. And we will switch to use Capstone for Power and ARM in the near future. I don't think we should spent too much time to fix this piece of code.

@hainest
Copy link
Contributor

hainest commented Aug 28, 2019

@mxz297 I am perfectly fine with leaving this alone since it will all be replaced with Capstone soon. I just wanted to make sure there wasn't a ticking timebomb waiting to get us.

Copy link
Contributor

@hainest hainest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing serious, mostly just a few places where I think we can constify things.

instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
instructionAPI/src/InstructionDecoder-aarch64.C Outdated Show resolved Hide resolved
@blue42u
Copy link
Contributor Author

blue42u commented Sep 2, 2019

Ready for another review, if its worth doing before the Capstone stuff is merged.

@hainest
Copy link
Contributor

hainest commented Sep 3, 2019

https://bottle.cs.wisc.edu/branch/PR633

Testing for this PR is a bit all of the place. I'm not sure if it's the test suite being it's usual buggy self or if there is a real problem. I will re-run.

@hainest
Copy link
Contributor

hainest commented Sep 4, 2019

It's passing on ARM and PPC, and the regressions on x86 are spurious.

@hainest hainest merged commit a7d4121 into dyninst:master Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Static variables in ARM instruction decoder cause crashes
3 participants