-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Fix EvalHWIntrinsicFunBinary
for ARM MultiplyByScalar
#94413
Conversation
MultiplyByScalar on ARM/ARM64 can have differently typed operands, so we cannot fold multiplications by one/zero without ensuring that we get the proper result type. Fix dotnet#93876
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Details
Fix #93876
|
<Optimize>True</Optimize> | ||
</PropertyGroup> | ||
<ItemGroup> | ||
<CLRTestEnvironmentVariable Include="DOTNET_JitStressModeNames" Value="STRESS_MAKE_CSE STRESS_SPLIT_TREES_REMOVE_COMMAS" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other tests there are comments that you need process isolation to use CLRTestEnvironmentVariable
, is that no longer true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's still true, thanks for noticing. Fixed this and another test.
} | ||
|
||
[MethodImpl(MethodImplOptions.NoInlining)] | ||
private static Vector4 Mul(float a, float b) => Vector4.Multiply(a + b, Vector4.One); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the type of a + b
here that causes the issue? Does it become TYP_DOUBLE
or something or is the issue that it's TYP_FLOAT
and the other operand is TYP_SIMD16
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gtNewSimdBinOpNode
creates the following IR on ARM64:
[000005] ----------- ▌ HWINTRINSIC simd16 float MultiplyByScalar
[000003] ----------- ├──▌ CNS_VEC simd16<0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000>
[000004] ----------- └──▌ HWINTRINSIC simd8 float CreateScalarUnsafe
[000002] ----------- └──▌ ADD float
[000000] ----------- ├──▌ LCL_VAR float V00 arg0
[000001] ----------- └──▌ LCL_VAR float V01 arg1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think MultiplyByScalar
shouldn't be handled by this path. E.g. Vector64.Multiply(Vector64<float>.One, x)
will produce <x, 0>
instead of <x, x>
because of that. So this needs a bit more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a change that handles MultiplyByScalar
separately, can you take another look?
The test fails without any stress mode due to bad codegen.
src/coreclr/jit/valuenum.cpp
Outdated
else | ||
{ | ||
oneVN = VNOneForType(baseType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this actually be hit given that we expect TYP_SIMD8
for the second operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question... let me add some strong asserts and simplify this a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same path down in the regular multiply handling is also "dead". It was likely intended to handle the vector * scalar
case, but we don't actually ever produce a GT_HWINTRINSIC(TYP_SIMD, TYP_SCALAR)
node anymore, afaict.
We appear to always normalize to Broadcast
on x86/x64. We similarly normalize to Broadcast
on Arm64
except for when a MultiplyByScalar
variant exists, in which case we always insert CreateScalarUnsafe
.
I then don't see any overloads of the instructions that would take a proper scalar register, so I don't think we'd ever hit it in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense... probably something to clean up separately at some point in the future.
/backport to release/8.0-staging |
Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/7827886101 |
@jakobbotsch backporting to release/8.0-staging failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: JIT: Fix EvalHWIntrinsicFunBinary for ARM MultiplyByScalar
Applying: Add RequiresProcessIsolation
Using index info to reconstruct a base tree...
A src/tests/JIT/Regression/JitBlue/Runtime_92201/Runtime_92201.csproj
Falling back to patching base and 3-way merge...
Auto-merging src/tests/JIT/Regression/JitBlue/Runtime_63942/Runtime_63942.csproj
CONFLICT (content): Merge conflict in src/tests/JIT/Regression/JitBlue/Runtime_63942/Runtime_63942.csproj
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Add RequiresProcessIsolation
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
@jakobbotsch an error occurred while backporting to release/8.0-staging, please check the run log for details! Error: git am failed, most likely due to a merge conflict. |
MultiplyByScalar
on ARM/ARM64 can have differently typed operands, so we cannot fold multiplications by one/zero without ensuring that we get the proper result type.Fix #93876