-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] Fix - Do not remove CAST
nodes on store if the LCL_VAR
is address exposed
#85734
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionFixes:
On both ARM and XARCH, I've done many experiments, and at the moment, I think the less risky change is to simply not remove the
|
CAST
nodes on assignment if the LCL_VAR
is a parameter.CAST
nodes on assignment if the LCL_VAR
is a parameter.
src/coreclr/jit/morph.cpp
Outdated
{ | ||
LclVarDsc* varDsc = lvaGetDesc(effectiveOp1->AsLclVarCommon()->GetLclNum()); | ||
|
||
// It is not safe to remove the cast for non-NormalizeOnLoad variables or parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention the "why" - i. e. that this is a quirk/workaround for VN's handling of NOL and that we know this is in fact not a complete fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't that these issues are with VN's handling of NOLs; I could be wrong - it's very hard to determine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this issue isn't with VN's handling, then what is it with? We should understand the problem before we work around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well - we don't know about other bugs related to this, but if there are, surely we would want to find out what they are about as well.
FWIW, example from #85382 looks very much like the VN issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should understand the problem before we work around it.
I tried to describe the issue in the description regarding the mov
elision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems likely that a peephole in the emitter would be able to get a large portion of the benefit of that subrange assertion check without having to complicate the NOL semantics with how stores are handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did experiment with adding a peephole to look at redundant movzx
instructions that would help certain cases.
@jakobbotsch so that opt, I disabled it and #85382 did pass, but not #85081.
Here is what I am using to test:
DOTNET_TieredCompilation=0
DOTNET_JitStressRegs=8
using System.Runtime.CompilerServices;
public class Program
{
public static long s_15;
public static sbyte s_17;
public static ushort s_21 = 36659;
public static void Main()
{
s_15 = ~1;
var x = (ushort)0;
bool vr1 = M40(x);
}
public static bool M40(ushort arg0)
{
for (int var0 = 0; var0 < 2; var0++)
{
arg0 = 65535;
arg0 &= (ushort)(s_15++ >> s_17);
arg0 %= s_21;
}
System.Console.WriteLine(arg0);
Test.MainTest();
return false;
}
}
public class Test
{
public class Program
{
public static short s_2;
[MethodImpl(MethodImplOptions.NoInlining)]
public static void Consume(int x) { }
[MethodImpl(MethodImplOptions.NoInlining)]
public static int M8(byte arg0)
{
s_2 = 1;
arg0 = (byte)(-s_2);
var vr1 = arg0 & arg0;
Consume(vr1);
return vr1;
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int MainTest()
{
var x = (byte)1;
var result = Test.Program.M8(x);
if (result != 255)
{
return 0;
}
Console.WriteLine("Success");
return 100;
}
}
A successful run will print:
28876
Success
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I too took a look at the dump for #85382 and the problem (in my interpretation) is that remorphing invalidates previous assumption as made by and acted on in assertion prop (@jakobbotsch - did you mean this too?):
- We have
NOL = CAST<short>(...)
. - Local assertion prop takes a dependency on this, does not add cast for a downstream use.
- Remorph invalidates the assumption by removing the cast (or, rather making it into a
CAST<int>
one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@jakobbotsch - did you mean this too?):
Yes, except I think the optimization made by local assertion prop makes the model of NOL locals much more complex than it should be (i.e. it means your explanation of NOL locals above is really not the fully story).
Local assertion prop here does not only rely on what it proves by seeing the subrange assertion. It also relies on normalization done by genUnspillRegIfNeeded
, GT_LCL_VAR
codegen and args homing to make this work.
In other words: the assertion only meaningfully describes the upper bits if it just so happens that the local ended up being in a 4-byte home for the time between the assignment that created it and the use.
It's very subtle and hard to reason about this in the face of interacting optimizations like this one (another issue I dug up with an interacting optimization: #66624). So if we could get most of the benefits of this optimization through an emitter peephole, and simplify morph to always normalize, then I think that would be much preferable to the current state of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, restricting the fix to parameters is not enough, it also needs to pessimize struct fields. The following example fails in the same way:
public static bool M40()
{
S s = default;
for (int var0 = 0; var0 < 2; var0++)
{
s.U = 65535;
s.U &= (ushort)(s_15++ >> s_17);
s.U %= s_21;
}
System.Console.WriteLine(s.U);
return false;
}
public struct S { public ushort U; }
…f why we cannot remove CAST nodes from parameters.
/azp run runtime-coreclr jitstress2-jitstressregs |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
I removed the NormalizeOnLoad optimization in On x64, regressions are: When combined with #85780, the regressions are (not including wins): We don't have the same optimizations like #85780 for ARM64 though. |
/azp run runtime-coreclr jitstress2-jitstressregs |
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
@jakobbotsch - actually, let's wait on this. I think we need to have more discussions on what we are going to do here. And it makes me want to be careful how we go about this, on the other hand, it's not good that we have a correctness problem. |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
CAST
nodes on assignment if the LCL_VAR
is a parameter.CAST
nodes on store if the LCL_VAR
is a parameter or struct field
/azp run runtime-coreclr jitstress2-jitstressregs |
@dotnet/jit-contrib @jakobbotsch this is ready. Failures are unrelated. Diffs - There are regressions, which I don't think can be avoided safely since this is a correctness fix. However, I think we could expand peephole optimizations to cover some of these regressions in the future. |
@TIHan Was this an existing issue that was just found or fallout from recent CAST optimizations? If the latter, then is the regression here (substantially?) less than the original win. I assume so - just a check on the scope of the changes. Thanks. |
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@markples It's not directly caused by any recent CAST optimizations, but those CAST optimizations may have allowed this particular issue around removing CAST around NOLs on STOREs(used to be ASG) to manifest itself. |
/azp run runtime-coreclr jitstress2-jitstressregs |
@dotnet/jit-contrib @jakobbotsch This is ready again, less regressions than before and CI seems to be passing :) |
// We can make this transformation only under the assumption that NOL locals are always normalized before they | ||
// are used, | ||
// however this is not always the case: the JIT will utilize subrange assertions for NOL locals to make | ||
// normalization | ||
// assumptions -- see fgMorphLeafLocal. Thus we can only do this for cases where we know for sure that | ||
// subsequent uses | ||
// will normalize, which we can only guarantee when the local is address exposed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting here is a bit strange, maybe reformat the comment locally?
// NormalizeOnLoad Rules: | ||
// 1. All small locals are actually TYP_INT locals. | ||
// 2. NOL locals are such that not all definitions can be controlled by the compiler and so the upper bits can | ||
// be undefined.For parameters this is the case because of ABI.For struct fields - because of padding.For | ||
// address - exposed locals - because not all stores are direct. | ||
// 3. Hence, all NOL uses(unless proven otherwise) are assumed in morph to have undefined upper bits and | ||
// explicit casts have be inserted to "normalize" them back to conform to IL semantics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this comment is formatted strangely, missing some spaces and it should be "address-exposed", not "address - exposed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Feel free to address the comment suggestions as part of a follow-up PR to avoid rerunning CI.
CAST
nodes on store if the LCL_VAR
is a parameter or struct fieldCAST
nodes on store if the LCL_VAR
is address exposed
Description
Fixes:
35514
instead of28876
#85382On both ARM and XARCH,
mov
instructions get elided forGT_STORE_LCL_VAR
when the destination and source registers are the same. This poses a problem whenGT_STORE_LCL_VAR
is for a NormalizedOnLoad parameter and struct field variable and itsop
had itsCAST
node removed - which means sign/zero-extended instructions never get emitted when they are supposed to.