-
Notifications
You must be signed in to change notification settings - Fork 5.3k
JIT: Skip redundant AND masking in NarrowWithSaturation codegen #122898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the codegen for Vector128/256.NarrowWithSaturation by eliminating redundant AND masking instructions on x86/x64. The optimization recognizes that when inputs are already clamped to the target range by preceding Min/Max operations, the subsequent AND operations are unnecessary.
Key Changes:
- Added an optional
inputsAlreadyClampedparameter togtNewSimdNarrowNodeto skip AND masking when inputs are guaranteed to be within target range - Updated the
NarrowWithSaturationcodegen path to passinputsAlreadyClamped=trueafter explicit clamping operations - Added regression test with disasm checks to verify the optimization
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/compiler.h | Added optional inputsAlreadyClamped parameter (default false) to gtNewSimdNarrowNode function signature |
| src/coreclr/jit/gentree.cpp | Implemented conditional logic to skip AND masking operations when inputsAlreadyClamped is true for four specific narrowing scenarios (TYP_UBYTE and TYP_USHORT paths in both AVX2 and SSE2) |
| src/coreclr/jit/hwintrinsicxarch.cpp | Modified NarrowWithSaturation implementation to pass inputsAlreadyClamped=true when calling gtNewSimdNarrowNode after explicit Min/Max clamping |
| src/tests/JIT/Regression/JitBlue/Runtime_116526/Runtime_116526.csproj | Added test project configuration with disasm checking enabled and JIT optimization settings |
| src/tests/JIT/Regression/JitBlue/Runtime_116526/Runtime_116526.cs | Added regression test with disasm assertions to verify vpand is not generated and functional tests to verify correctness for UInt16→Byte and UInt32→UShort narrowing |
bd48091 to
ea91488
Compare
src/tests/JIT/Regression/JitBlue/Runtime_116526/Runtime_116526.cs
Outdated
Show resolved
Hide resolved
Vector128/256.NarrowWithSaturation was generating redundant vpand instructions because gtNewSimdNarrowNode didn't know the inputs were already clamped to the target range by the preceding Min() operations. This change adds an optional parameter to gtNewSimdNarrowNode to skip the AND masking when the caller knows inputs are already in range. The fix applies to x86/x64 only. ARM64 uses native AdvSimd instructions that don't have this issue. Before: vpminuw xmm1, xmm0, xmmword ptr [...] vpand xmm1, xmm1, xmm0 ; redundant vpminuw xmm2, xmm0, xmmword ptr [...] vpand xmm0, xmm2, xmm0 ; redundant vpackuswb xmm0, xmm1, xmm0 After: vpminuw xmm1, xmm0, xmmword ptr [...] vpminuw xmm0, xmm0, xmmword ptr [...] vpackuswb xmm0, xmm1, xmm0
ea91488 to
431213e
Compare
Fixes #116526
When
NarrowWithSaturationis used, we're already clamping viaMin()internally - so the subsequentvpandto mask the result is redundant. This adds aninputsAlreadyClampedparam togtNewSimdNarrowNodeso callers likeNarrowWithSaturationcan skip the extra AND.Approach suggested by @tannergooding in the issue.
Before:
After:
ARM64 isn't affected - it uses AdvSimd intrinsics that handle this natively.
Tested with System.Runtime.Intrinsics and System.Numerics.Vectors test suites, plus added a disasm check test.