Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#2626: AArch64 v8.0 decode: Reorder instrs in codec.txt #5115

Merged
merged 4 commits into from
Sep 23, 2021

Conversation

joshua-warburton
Copy link
Collaborator

This patch changes the instrs in codec.txt to be alphabetically
ordered rather than grouped by semi-random categories. It
also introduces codecsort.py which can manage and enforce this
ordering.

Issue: #2626

This patch changes the instrs in codec.txt to be alphabetically
ordered rather than grouped by semi-random categories. It
also introduces codecsort.py which can manage and enforce this
ordering.

Issue: #2626

Change-Id: I2d8769602d837d2e02acad820bf78e1b83d10622
Change-Id: If9a2684b2c16ea510a008b6ace8d72c28a75759d
@joshua-warburton joshua-warburton merged commit 545f320 into master Sep 23, 2021
@joshua-warburton joshua-warburton deleted the i2626-instr-reorder branch September 23, 2021 10:03
@abhinav92003
Copy link
Contributor

I'm curious why we need to keep it ordered. I rather liked the previous structure where we grouped the opcodes like "Advanced SIMD", "Floating-point" etc. I was searching for advanced SIMD opcodes and found that those convenient comments were no longer present.

@AssadHashmi
Copy link
Contributor

The existing groupings were not consistent and had become confused. The alphanumeric ordering is to make it easier to look for and add specific instructions during high rates of codec.txt churn, rather than browse groups of similar instructions in a stable codec.txt. This file is going to get bigger as we complete v8.0 support and for the purposes of quicker completion we decided to mirror the ordering in the reference manual. Once v8.0 is complete we can re-organise a stable codec.txt along different more browsable criteria.

What would be the preferred layout? Possible options are:

  1. Leave it in alphanumeric order mirroring the reference manual.

  2. Leave it in alphanumeric order and have group comment tags at the end of each defintion, e.g.

# COND
# INT GPR
# INT MEM GPR
# FP MEM SCALAR
# FP MEM VECTOR
. . .
  1. Rearrange by the above groupings, each group having its own alphanumeric ordering.

How coarse grained do we want the groupings? The coarsest is INT, FP and MEM, or something like the above with higher resolution.

@derekbruening
Copy link
Contributor

Does changing the ordering of codec.txt change the exported interface enum of the opcodes? If so, that's a compatibility break, and we would want to be careful about that. A compatibility message needs to be added to the changelog in release.dox vs the 8.0 release 4/21/20, and between releases it's still best to avoid such changes (adds pain to user experience if every cronbuild changes the enum order).

@AssadHashmi
Copy link
Contributor

Does changing the ordering of codec.txt change the exported interface enum of the opcodes?

No, the exported enum of opcodes has always been and will remain numbering in alphabetical order of opcode name, independent of ordering in codec.txt.

@derekbruening
Copy link
Contributor

No, the exported enum of opcodes has always been and will remain numbering in alphabetical order of opcode name, independent of ordering in codec.txt.

So every opcode addition to codec.txt changes the interface: this may cause problems. Presumably binary compatibility has already been broken with the 8.0 release, and if we put out another release, it will immediately break on the next change. I think this alphabetizing may need to be reconsidered especially long-term.

@AssadHashmi
Copy link
Contributor

How did x86 handle a fixed opcode<->enum mapping during development?
Presumably, the mapping was changing until the full decoder/encoder had been implemented?

@derekbruening
Copy link
Contributor

How did x86 handle a fixed opcode<->enum mapping during development?
Presumably, the mapping was changing until the full decoder/encoder had been implemented?

IIRC the initial set of pre-SSE insructions was in place before any public interface. Every set of additional opcodes added has appended to the end to avoid breaking compatibility: mostly as sets grouped by ISA feature (SSE2, AVX, etc.) but even some from prior sets that were accidentally missed were appended such as at https://github.com/DynamoRIO/dynamorio/blob/master/core/ir/x86/decode_table.c#L1091

@derekbruening
Copy link
Contributor

derekbruening commented Sep 30, 2021

Given a number of recent users hitting problems that might have been avoided with a more recent build, and with the last official release 8.0 from a full 18 months ago (8.0 was Apr 21, 2020), we're thinking we should put out a new official release. IMHO it would be best to not change the opcode ordering across that release -- so the proposal is to change the codec to freeze the current ordering and append new opcodes after the freeze. Does that sound reasonable, and is that something that could be done in the next week or two and then we could put out a new release once that's in place?

@AssadHashmi
Copy link
Contributor

Given a number of recent users hitting problems that might have been avoided with a more recent build, and with the last official release 8.0 from a full 18 months ago (8.0 was Apr 21, 2020), we're thinking we should put out a new official release. IMHO it would be best to not change the opcode ordering across that release -- so the proposal is to change the codec to freeze the current ordering and append new opcodes after the freeze. Does that sound reasonable, and is that something that could be done in the next week or two and then we could put out a new release once that's in place?

Do you mean we should not make any changes, freezing the current codec.txt until the new release is done?
Or revert this change to the previous ordering and freeze until the new release is done?

@derekbruening
Copy link
Contributor

Given a number of recent users hitting problems that might have been avoided with a more recent build, and with the last official release 8.0 from a full 18 months ago (8.0 was Apr 21, 2020), we're thinking we should put out a new official release. IMHO it would be best to not change the opcode ordering across that release -- so the proposal is to change the codec to freeze the current ordering and append new opcodes after the freeze. Does that sound reasonable, and is that something that could be done in the next week or two and then we could put out a new release once that's in place?

Do you mean we should not make any changes, freezing the current codec.txt until the new release is done? Or revert this change to the previous ordering and freeze until the new release is done?

I think we're ok breaking compatibility with 8.0 given the other changes we also have there.

The proposal is to freeze the OP_ enum ordering at the point of the new release forever and only append to it afterward.
This could be done by adding all the known opcodes in the Armv8.N target you're focusing on right now but w/o full decoding (even leaving as OP_xx; simply getting them into the OP_ array); or by somehow marking in codec.txt which ones are in this release and ordering them first or something.

A complication might be the opcode splitting in #4388, #4386 (comment), #4393. But if those are implemented after the new release I think we would just live with any newly split opcodes being appended.

@AssadHashmi
Copy link
Contributor

The proposal is to freeze the OP_ enum ordering at the point of the new release forever and only append to it afterward. This could be done by adding all the known opcodes in the Armv8.N target you're focusing on right now but w/o full decoding (even leaving as OP_xx; simply getting them into the OP_ array); or by somehow marking in codec.txt which ones are in this release and ordering them first or something.

Understood. We're thinking of adding an index number to encoding definitions in codec.txt which is the same as the OP_ enum and will not change once set. We may define all v8.0 encodings, in order to fix the enum set, even those not yet implemented.

@AssadHashmi
Copy link
Contributor

@derekbruening @abhinav92003 Is there any reason why we can't use OP_UNDECODED rather than OP_xx on AArch64?
I don't know the history of OP_xx and if we can we should replace it.

@AssadHashmi
Copy link
Contributor

@derekbruening @abhinav92003 What are the (example) use-cases which require that opcodes' enums are the same across DynamoRIO releases?

I'm thinking of opcodes which are augmented as new versions of the ISA are released, e.g. for MMX, SSE and AVX on Intel x86, and for v8.0, v8.2 and SVE on AArch64.

On AArch64 we have the same opcode name for integer, fixed-width vector and scalable vector versions, e.g. ADD :

v8.0      ADD <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}
v8.0,v8.2 ADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>
SVE       ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Do we want the enum to be the same all the way through?

@AssadHashmi
Copy link
Contributor

I should have posted these questions on the relevant issue #5144
Let's continue any further discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants