-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable matching constants for vectors that needs upper half to be save/restore #74110
Conversation
…nstant (dotnet#70171)" This reverts commit 24f5de4.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis reverts commit 24f5de4. #70171 reuses There were several alternatives tried:
For now, I am reverting the change and handle it in better way as part of #70182. Fixes: #70642
|
@tannergooding @BruceForstall @dotnet/jit-contrib |
src/coreclr/jit/lsra.cpp
Outdated
case GT_CNS_VEC: | ||
{ | ||
return GenTreeVecCon::Equals(refPosition->treeNode->AsVecCon(), otherTreeNode->AsVecCon()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reverting, would it be feasible for this to be the following?
return (otherTreeNode->gtType != TYP_SIMD32) && GenTreeVecCon::Equals(refPosition->treeNode->AsVecCon(), otherTreeNode->AsVecCon());
Asking as the original change had some measured perf wins and since it looks like the bug only impacts TYP_SIMD32
due to the need to save/restore the upper halves (meaning I think it shouldn't impact TYP_SIMD8/12/16
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to scope it, we only need to do it for platforms where FEATURE_PARTIAL_SIMD_CALLEE_SAVE
is defined (win-amd64, arm64, LoongArch64). Presumably there is no issue on x86?
varTypeNeedsPartialCalleeSave
would need to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I still think it's probably "too late" to construct a fix to ensure correctness, on all platforms, when it's trivial to revert the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
varTypeNeedsPartialCalleeSave
check seems reasonable to include, but I agree with @BruceForstall that the simplest thing would be to revert the change at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, just thought I'd throw it out as a suggestion given this will have negative impact on Arm64.
I expect we'll see this one dotnet/perf-autofiling-issues#5976 and a few of the other code paths that use vector constants regress a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the spmi-diffs and there are few in benchmarks. I tried the change to just use varTypeNeedsPartialCalleeSave
and didn't see as many spmi-diffs in benchmarks, asp locally. I will let CI run and check other diffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no change in spmi diff on Arm64 if we just disable isMatchingConstant
for varTypeNeedsPartialCalleeSave
. On x64, there is slight difference (earlier in benchmark collection, 12 methods regressed and now 5).
it looks like the bug only impacts TYP_SIMD32 due to the need to save/restore the upper halves (meaning I think it shouldn't impact TYP_SIMD8/12/16).
@tannergooding , even Arm64 needs save/restore upper half as per this comment, so it could be problematic there too, right?
runtime/src/coreclr/jit/compiler.h
Lines 7439 to 7445 in 519d1e3
static bool varTypeNeedsPartialCalleeSave(var_types type) | |
{ | |
assert(type != TYP_STRUCT); | |
// ARM64 ABI FP Callee save registers only require Callee to save lower 8 Bytes | |
// For SIMD types longer than 8 bytes Caller is responsible for saving and restoring Upper bytes. | |
return ((type == TYP_SIMD16) || (type == TYP_SIMD12)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so it is (https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#simd-and-floating-point-registers):
Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64 bits of each value stored in v8-v15 need to be preserved [7]; it is the responsibility of the caller to preserve larger values.
I thought the entire register needed to be saved, but apparently not, that's unfortunate.
There is no change in spmi diff on Arm64 if we just disable isMatchingConstant for varTypeNeedsPartialCalleeSave. On x64, there is slight difference (earlier in benchmark collection, 12 methods regressed and now 5).
What was the regression on Arm64 if we revert the entire PR?
Getting the regression down to just 5 on x64 sounds 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the regression on Arm64 if we revert the entire PR?
There is no change between total revert and limiting it to just varTypeNeedsPartialCalleeSave
on Arm64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this incredible resource. I didn't know about it.
…tchingConstant (dotnet#70171)"" This reverts commit 984120f.
CC @jeffschwMSFT, this is the bug fix for #70642, and we are reverting it. |
If wanted in 7, we need to backport to 7 RC1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please make sure you change the title and text of this PR to reflect the current change (it's no longer "revert". Same for the commit message when you merge.
/backport to release/7.0-rc1 |
Started backporting to release/7.0-rc1: https://github.com/dotnet/runtime/actions/runs/2885813879 |
The test no longer fails in https://dev.azure.com/dnceng/public/_build/results?buildId=1952719&view=results . |
This reverts commit 24f5de4.#70171 reuses
Vector constants
in register allocation but LSRA doesn't handle it very well. It doesn't have a semantics of looking into constant registers and save/restore the upper-half of them during the call. Because of this, the constant register that we want to reuse after the call gets garbage data in the upper-half and leads to silent functional differences.There were several alternatives tried:
buildUpperVectorSave()
also check if there are any vector registers holding constants and addRefPosition
to save/restore, but we don't have that information until the register allocation.RefTypeKill
refposition and clear the constant vector registers mask during allocation, but we don't have thetreeNode
information forRefTypeKill
to accurately do this only for Calls and not for anything else.For now, I am reverting the change and handle it in better way as part of #70182.For now, I am disabling the
IsMatchingConstant()
optimization for registers whose upper half need save/restore.Fixes: #70642