-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FastISel][AArch64] Compare Instruction Miscompilation Fix #75993
[FastISel][AArch64] Compare Instruction Miscompilation Fix #75993
Conversation
When shl is folded in compare instruction, a miscompilation occurs when the CMP instruction is also sign-extended. This fix avoids folding shl into cmp instruction when the second operand is extended, and also turns equality operations into unsigned comparisons.
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-aarch64 Author: None (brendaso1) ChangesWhen shl is folded in compare instruction, a miscompilation occurs when the CMP instruction is also sign-extended. For the following IR: %op3 = shl i8 %op2, 3 It used to generate cmp w8, w9, sxtb #3 which means sign extend w9, shift left by 3, and then compare with the value in w8. However, the original intention of the IR is "left shift w9 by 3 bits, zero-extend the value, then compare with the value in w8". This PR creates a fix for the issue, more specifically to not fold the left shift into the CMP instruction, and to create a zero-extended value rather than a sign-extended value. Full diff: https://github.com/llvm/llvm-project/pull/75993.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FastISel.cpp b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
index 9b8162ce8dd4d0..ddd2c4eebe491c 100644
--- a/llvm/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
@@ -1233,7 +1233,11 @@ unsigned AArch64FastISel::emitAddSub(bool UseAdd, MVT RetVT, const Value *LHS,
isValueAvailable(RHS)) {
if (const auto *SI = dyn_cast<BinaryOperator>(RHS))
if (const auto *C = dyn_cast<ConstantInt>(SI->getOperand(1)))
- if ((SI->getOpcode() == Instruction::Shl) && (C->getZExtValue() < 4)) {
+ if ((SI->getOpcode() == Instruction::Shl) && (C->getZExtValue() < 4) &&
+ !NeedExtend) {
+ // We can only fold instructions that doesn't need extension because
+ // in add/sub instruction, the instruction is first extended, then
+ // shifted
Register RHSReg = getRegForValue(SI->getOperand(0));
if (!RHSReg)
return 0;
@@ -2426,7 +2430,7 @@ bool AArch64FastISel::selectBranch(const Instruction *I) {
}
// Emit the cmp.
- if (!emitCmp(CI->getOperand(0), CI->getOperand(1), CI->isUnsigned()))
+ if (!emitCmp(CI->getOperand(0), CI->getOperand(1), !CI->isSigned()))
return false;
// FCMP_UEQ and FCMP_ONE cannot be checked with a single branch
diff --git a/llvm/test/CodeGen/AArch64/fastisel-shift-and-cmp.ll b/llvm/test/CodeGen/AArch64/fastisel-shift-and-cmp.ll
new file mode 100644
index 00000000000000..5dad290134bbd7
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fastisel-shift-and-cmp.ll
@@ -0,0 +1,33 @@
+; RUN: llc --global-isel=false -fast-isel -O0 -mtriple=aarch64-none-none < %s | FileCheck %s
+
+; Check that the shl instruction did not get folded in together with
+; the cmp instruction. It would create a miscompilation
+
+@A = dso_local global [5 x i8] c"\C8\AA\C8\AA\AA"
+@.str = private unnamed_addr constant [13 x i8] c"TRUE BRANCH\0A\00"
+@.str.1 = private unnamed_addr constant [14 x i8] c"FALSE BRANCH\0A\00"
+
+define dso_local i32 @main() {
+
+ %tmp = load i8, ptr getelementptr inbounds ([5 x i8], ptr @A, i64 0, i64 1)
+ %tmp2 = load i8, ptr getelementptr inbounds ([5 x i8], ptr @A, i64 0, i64 2)
+ %op1 = xor i8 %tmp, -49
+ %op2 = mul i8 %op1, %op1
+; CHECK-NOT: cmp [[REGS:.*]] #[[SHIFT_VAL:[0-9]+]]
+ %op3 = shl i8 %op2, 3
+ %tmp3 = icmp eq i8 %tmp2, %op3
+ br i1 %tmp3, label %_true, label %_false
+
+_true:
+ %res = call i32 (ptr, ...) @printf(ptr noundef @.str)
+ br label %_ret
+
+_false:
+ %res2 = call i32 (ptr, ...) @printf(ptr noundef @.str.1)
+ br label %_ret
+
+_ret:
+ ret i32 0
+}
+
+declare i32 @printf(ptr noundef, ...)
|
@.str = private unnamed_addr constant [13 x i8] c"TRUE BRANCH\0A\00" | ||
@.str.1 = private unnamed_addr constant [14 x i8] c"FALSE BRANCH\0A\00" | ||
|
||
define dso_local i32 @main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to simplify the test? I think it could
- remove dso_local
- remove the globals
- remove the loads
- remove the calls
- remove --global-isel=false and -O0?
There are some existing tests in llvm/test/CodeGen/AArch64/arm64-fast-isel-icmp.ll, but they don't seem to include any with shift amounts at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm in that case do you think it's more beneficial if I put this test in llvm/test/CodeGen/AArch64/arm64-fast-isel-icmp.ll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds like it might be a good place for it. Feel free to use the update_llc_test_checks script too to generate the check lines.
if ((SI->getOpcode() == Instruction::Shl) && (C->getZExtValue() < 4)) { | ||
if ((SI->getOpcode() == Instruction::Shl) && (C->getZExtValue() < 4) && | ||
!NeedExtend) { | ||
// We can only fold instructions that doesn't need extension because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reach this point now, if NeedExtend==false and ExtendType != AArch64_AM::InvalidShiftExtend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add ExtendType != AArch64_AM::InvalidShiftExtend
because there is already a check before this nested-if statement in line 1232. Do you think it's still better to add the check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I meant that if ExtendType != InvalidShift
then does that mean that NeedExtend is always false? Is it ever possible to get to the emitAddSub_rx(..) line now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah Gotcha. You're right, if ExtendType != AArch64_AM::InvalidShiftExtend
, NeedExtend is always true. So the logic would never be reached. Maybe it's a better idea to remove that part of the logic all together?
- remove dso_local - remove the globals - remove the loads - remove the calls - remove --global-isel=false and -O0
I have a check that is failing but I am not sure what the cause is.... It looks like the last tests that failed are on MLIR: https://buildkite.com/llvm-project/github-pull-requests/builds/25125 , but my fix has nothing to do with that part of the code, and all the automated checks were passing a few days ago. Is it because this branch does not have the most up-to-date commits as main? Should I merge / rebase my branch to main in this case? |
Yep I think we can ignore those errors, they look unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the update_llc_test_checks.py script to update the test? Then this LGTM, thanks.
Added the generated CHECK lines to the unit tests. Let's see how the builders look. Also, when I did more testing, it seems like the sample test wouldn't fail even if the Cmp is sign-extended rather than zero-extended in assembly. So I removed the related changes as well. |
Thanks. This LGTM. Are you happy for it to be squashed and merged? |
Yes let's do it! 😁 |
Thanks for the fix @brendaso1. I find the "squash and merge" button satisfying so I hit it for you. If you contribute 2/3 more patches you can request commit access yourself. |
When shl is folded in compare instruction, a miscompilation occurs when the CMP instruction is also sign-extended. For the following IR:
%op3 = shl i8 %op2, 3
%tmp3 = icmp eq i8 %tmp2, %op3
It used to generate
cmp w8, w9, sxtb #3
which means sign extend w9, shift left by 3, and then compare with the value in w8. However, the original intention of the IR would require
%op2
to first shift left before extending the operands in the comparison operation . Moreover, if sign extension is used instead of zero extension, the sample test would miscompile. This PR creates a fix for the issue, more specifically to not fold the left shift into the CMP instruction, and to create a zero-extended value rather than a sign-extended value.