Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Ignore REX prefixes not immediately before opcode #117299

Merged
merged 2 commits into from
Nov 22, 2024

Conversation

boomanaiden154
Copy link
Contributor

The Intel X86 Architecture Manual says the following:

A REX prefix is ignored, as are its individual bits, when it is not needed
for an instruction or when it does not immediately precede the opcode byte or
the escape opcode byte (0FH) of an instruction for which it is needed. This
has the implication that only one REX prefix, properly located, can affect an
instruction.

We currently do not handle these cases in the disassembler, leading to incorrect disassembly. This patch rectifies the situation by treating REX prefixes as standard prefixes rather than only expecting them before the Opcode.

The motivating test case added as a test was fuzzer generated.

The Intel X86 Architecture Manual says the following:

> A REX prefix is ignored, as are its individual bits, when it is not needed
> for an instruction or when it does not immediately precede the opcode byte or
> the escape opcode byte (0FH) of an instruction for which it is needed. This
> has the implication that only one REX prefix, properly located, can affect an
> instruction.

We currently do not handle these cases in the disassembler, leading to
incorrect disassembly. This patch rectifies the situation by treating
REX prefixes as standard prefixes rather than only expecting them before
the Opcode.

The motivating test case added as a test was fuzzer generated.
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2024

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-x86

Author: Aiden Grossman (boomanaiden154)

Changes

The Intel X86 Architecture Manual says the following:

> A REX prefix is ignored, as are its individual bits, when it is not needed
> for an instruction or when it does not immediately precede the opcode byte or
> the escape opcode byte (0FH) of an instruction for which it is needed. This
> has the implication that only one REX prefix, properly located, can affect an
> instruction.

We currently do not handle these cases in the disassembler, leading to incorrect disassembly. This patch rectifies the situation by treating REX prefixes as standard prefixes rather than only expecting them before the Opcode.

The motivating test case added as a test was fuzzer generated.


Full diff: https://github.com/llvm/llvm-project/pull/117299.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp (+8-5)
  • (modified) llvm/test/MC/Disassembler/X86/x86-64.txt (+5)
diff --git a/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp b/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
index c3eae294919f3c..c27177484f55a4 100644
--- a/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
+++ b/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
@@ -329,6 +329,14 @@ static int readPrefixes(struct InternalInstruction *insn) {
       break;
     }
 
+    if (isREX(insn, byte)) {
+      insn->rexPrefix = byte;
+      isPrefix = true;
+      LLVM_DEBUG(dbgs() << format("Found REX prefix 0x%hhx", byte));
+    } else if (isPrefix) {
+      insn->rexPrefix = 0;
+    }
+
     if (isPrefix)
       LLVM_DEBUG(dbgs() << format("Found prefix 0x%hhx", byte));
   }
@@ -506,11 +514,6 @@ static int readPrefixes(struct InternalInstruction *insn) {
     LLVM_DEBUG(dbgs() << format("Found REX2 prefix 0x%hhx 0x%hhx",
                                 insn->rex2ExtensionPrefix[0],
                                 insn->rex2ExtensionPrefix[1]));
-  } else if (isREX(insn, byte)) {
-    if (peek(insn, nextByte))
-      return -1;
-    insn->rexPrefix = byte;
-    LLVM_DEBUG(dbgs() << format("Found REX prefix 0x%hhx", byte));
   } else
     --insn->readerCursor;
 
diff --git a/llvm/test/MC/Disassembler/X86/x86-64.txt b/llvm/test/MC/Disassembler/X86/x86-64.txt
index 8d6564dd098990..b41226766a194e 100644
--- a/llvm/test/MC/Disassembler/X86/x86-64.txt
+++ b/llvm/test/MC/Disassembler/X86/x86-64.txt
@@ -770,3 +770,8 @@
 
 # CHECK: prefetchit1 (%rip)
 0x0f,0x18,0x35,0x00,0x00,0x00,0x00
+
+# Check that we correctly ignore a REX prefix that is not immediately before
+# the opcode.
+# CHECK: orw $25659, %ax
+0x66 0x4c 0x64 0x0d 0x3b 0x64

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@boomanaiden154 boomanaiden154 merged commit 7530e70 into llvm:main Nov 22, 2024
5 of 8 checks passed
@boomanaiden154 boomanaiden154 deleted the x86-disasm-issue-11-21-24 branch November 22, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants