-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid VN bugs with struct reinterpretation. #57076
Conversation
/azp list |
This comment has been minimized.
This comment has been minimized.
Do we need a similar fix for hardware intrinsics (#35620)? CC. @briansull |
/azp run runtime-coreclr libraries-jitstress, runtime-coreclr jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
85b99f4
to
9da6c62
Compare
|
||
// Generated by Fuzzlyn v1.2 on 2021-08-06 19:31:41 | ||
// Run on .NET 6.0.0-dev on Arm Linux | ||
// Seed: 5500224797583134883 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you hand modify this program?
And will it be generated using this Seed value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all fuzzling repos are hand modified because they usually don't return 100 on success.
S0 vr0 = default(S0); | ||
long vr1 = vr0.F0++; | ||
s_1[0, 0] = vr0; | ||
if (s_1[0, 0].F3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field F3 looks to me like it will always be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue is with the release build the condition satisfies and console writeline expression runs (1 line printed in release). Debug build is fine as no line is printed.
a0bb06a
to
6a96055
Compare
/azp run runtime-coreclr libraries-jitstress, runtime-coreclr jitstress, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
6a96055
to
2af4408
Compare
/azp run runtime-coreclr libraries-jitstress, runtime-coreclr jitstress, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr libraries-jitstress, runtime-coreclr jitstress, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
ClassLayout::AreCompatible(structLayout, varDsc->GetLayout())) | ||
{ | ||
indir->ChangeOper(GT_LCL_VAR); | ||
indir->AsLclVar()->SetLclNum(val.LclNum()); | ||
|
||
if (structLayout->GetClassHandle() != varDsc->GetLayout()->GetClassHandle()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here, about the Class handle mismatch, and how/why we handle it
@@ -1065,6 +1087,14 @@ class LocalAddressVisitor final : public GenTreeVisitor<LocalAddressVisitor> | |||
|
|||
indir->gtFlags = flags; | |||
|
|||
if (cantDoVNOnTHisTree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here as well
tree->gtVNPair = vnStore->VNPairApplySelectors(lclVNPair, lclFld->GetFieldSeq(), indType); | ||
if (lclVNPair.GetLiberal() == ValueNumStore::NoVN) | ||
{ | ||
// We didn't not assign a correct VN to the local, probably it was written using a different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briansull I am trying to pass the ci, could you please help me with it? The failures look like https://dev.azure.com/dnceng/public/_build/results?buildId=1289712&view=ms.vss-test-web.build-test-results-tab&runId=38073760&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab&resultId=107788
on arm32 windows/linux only with JitStess and only on one test. We assert when try to ApplySelector
from NoVN
.
I have added this code for one cases but we have more failures, what I don't understand is why we don't hit these asserts without my changes, do I understand correctly that:
- LCL_VAR can have a unique VN before the changes;
- other code parts expect it and do the right things with it;
if both are true why do we hit these asserts and what is the best way to fix it? Should I add similar checks if (lclVNPair.GetLiberal() == ValueNumStore::NoVN)
to all failing cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand your issue.
It is difficult to answer your questions without a full understanding of what is going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it is an error to have read with a ValueNumber of NoVN
Check if line 7500 (above) applies here:
// There are a couple of cases where we haven't assigned a valid value number to 'lcl'
//
if (lcl->gtVNPair.GetLiberal() == ValueNumStore::NoVN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I don't understand where here do we update lvaTable[lclNum].GetPerSsaData(lclDefSsaNum)->m_vnPair.SetBoth(lclVarVN);
?
And when I do lvaTable[lclNum].GetPerSsaData(lclDefSsaNum)->m_vnPair.SetBoth(ValueNumStore::NoVN);
it starts to produce errors, so what value should I write to GetPerSsaData
when the local has invalid VN and why don't we write it there in the example on line 7500?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandreenko
I can look at a Jit Dump if you have one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you need to mark the local variable as unsafe to ValueNumber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another fix would be to set
lvaTable[V09].GetPerSsaData() to $500
when processing [000365]
LCL_VAR long V15 tmp9 u:2 (last use) $500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly VN that we print on LHS of an assignment does not matter at all, the real value is saved in lvaTable[V09].GetPerSsaData()
.
At least it is how I understand this:
runtime/src/coreclr/jit/valuenum.cpp
Line 7549 in 7b0a27e
lcl->gtVNPair = ValueNumPair(); // Avoid confusion -- we don't set the VN of a lcl being defined. |
but in our case it is a Def, because it is a use:
N005 ( 13, 11) [000365] -A------R---- * ASG long $VN.Void
N004 ( 6, 5) [000363] *------N----- +--* IND long $500
N003 ( 3, 3) [000321] ------------- | \--* ADDR byref Zero Fseq[_00(0xab45adc)] $345
N002 ( 3, 2) [000322] $U------N----- | \--* LCL_VAR struct<System.Runtime.Intrinsics.Vector128`1[Double], 16> V09 tmp3 ud:2->3 $482 <- marked as use.
N001 ( 3, 2) [000364] ------------- \--* LCL_VAR long V15 tmp9 u:2 (last use) $500
but, obviously, we don't care about its value, we won't read it anywhere.
What we care about is value of lvaTable[V09].GetPerSsaData()
we want it be NoVN
and we want the next tree that uses it not to try to optimize it or "see" the values of any of the fields inside this lclVar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you are saying above does sound correct to me. I haven't made many (or any) changes in this area SSA and VN, so it is also my understanding from reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But with this approach you will have to check for this value for every read (use) site, because the mismatch can occur either at the def or the use. In the above case we have a mismatch on the def site, which I guess already does write a NoVN, but later at the use site we pull out the NoVN and try to use it in VNPairApplySelectors and the use site won't have the GTF_VAR_DONT_VN
flag set as the isn't a mismatch on the use site.
ValueNumPair lhsVNPair; | ||
|
||
if (!lcl->DontVN()) | ||
{ | ||
lhsVNPair = rhsVNPair; | ||
} | ||
else | ||
{ | ||
ValueNum uniqVN = vnStore->VNForExpr(compCurBB, lcl->TypeGet()); | ||
lhsVNPair.SetBoth(uniqVN); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears lhsVNPair
is unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, thanks for the catch. I need to investigate why it fixes one of the test cases that I was targeting with this change.
This JIT only change is too risky, I am closing this in favor of #57282 that has less VN changes. |
What issue are we trying to solve?
VN optimizations are using a unique aliasing model that depends on correct FieldSeq that are not respected by at least 3 places:
Unsafe.As<struct1, struct2>(ref struct1).fieldOfStruct2
will be imported asFIELD(ADDR byref(*))
and JIT does not keep pointer type soADDR byref( LCL_VAR struct1)
andADDR byref( LCL_VAR struct2)
are the same for us.More details about this case
Later if lclmorph sees
FIELD fieldOfStruct2(ADDR byref(LCL_VAR struct1)
it will produceLCL_FLD struct1.fieldOfStruct2
and VN will do incorrect things with it, for example, if fieldOfStruct2 has offset 0 and type INT and fieldOfStruct1 has the same offset and the same type and we have:Question: why can't we catch and reject
LCL_FLD
creation when we seeFIELD fieldOfStruct2(ADDR byref(LCL_VAR struct1)
andCORINFO_FIELD_HANDLE
of the field does not belong toCORINFO_CLASS_HANDLE
of the struct?Answer: Because we don't have a JitEE method to check that
CORINFO_FIELD_HANDLE
belongs toCORINFO_CLASS_HANDLE
. It can be added but maybe too late for 6.0.ASG(LCL_VAR struct1, LCL_VAR struct2)
. For the rest of the JIT these are identical memory chunks.More details about this case
this happens for structs like:
Question: why can't we see during VN that
CORINFO_CLASS_HANDLE
of LHS and RHS are different?Answer: it is possible, but it would require similar changes in VN to what is done in this PR and it won't solve the first issue. Also, there could be issue with
CORINFO_CLASS_HANDLE of Generic<Canon> != CORINFO_CLASS_HANDLE of Generic<Concrete>
that looks fine until we check
[000374] -------N---- \--* FIELD struct _pointer
and see that its class name isSystem.Span
1[System.TimeZoneInfo+AdjustmentRule]`, so if we use both FldSeq for canon and non-canon classes VN will make wrong transformations.How does this PR solve it?
It marks
GT_LCL_VAR
after such transformations asGTF_VAR_DONT_VN
andGT_LCL_FLD
are created withNoField
seq. Then we try to keep these values until VN and respect them there. If all works correct we generate a unique VN for the LCL_VAR and we don't try to determine its value or values of its fields.There are 3 potential issues with this approach:
The first is low-risk, because the places where we do such things are well-known - inlining of
Unsafe.As
in import andMorphLocalIndir
in lclmorph.The second is worse, but from my analysis, we keep the flag because we copy it from old tree to a new one or because we don't change trees (for example, assertion propagation replaces LCL_NUM and keeps the flag intact).
The third is the biggest risk, in my opinion, because our understanding and control over VN is vague.
What were the alternatives?
MorphLocalIndir
transformationit causes pretty bad regressions, like
and does not solve the first issue.
varDsc->lvOverlappingFields = true;
for such structs as was done before.This approach does not work as #42517 has shown.
lvOverlappingFields
in the jit is 'type' information, so JIT expects two LCL_VAR that have the same struct type to have the same value oflvOverlappingFields
.Because of that assumption, we don't check the flag during assertion prop and we had cases like:
and assertion propagation was replacing LCL_VAR1 with LCL_VAR2 and ended up with
where both don't forget about
lvOverlappingFields
onLCL_VAR1
.Also, the flag set by itself was not blocking struct promotion, so we often replaced local with the fields without that flag set.
Set
lclAddressExposed
in both cases, this will kill all optimizations and end up with awful code generated in scenarios where people use Unsafe to get performance.a hybrid approach, like rely on other phases not to maintain correct
FldSeq
(with a new JitEEInterface method) but force VN to check LHS CORINFO_CLASS_HANDLE == RHS CORINFO_CLASS_HANDLE for assignments.This one could actually work, I need to revisit what it will take to add JitEE method.
I am from the future, how do I fix it if I have time?
1 Change VN to a physical model, get rid of FldSeq;
or
2 Add JitEEInterface methods to check that FldSeq is correct during VN, like I was trying in #49504 and do all these checks inside VN so other phrases don't depend on what is necessary only for VN.
and in addition:
3 Support
LCL_FLD struct
types;Fixes
Fixes #42517, fixes #49954, fixes #54102, fixes #56980.
Diffs
The diffs are small, all regressions.
Diffs.
Regression analysis.
The regressions are in cases like:
and in the past we were setting
V20.lvOverlappingFields = true
but we were forgetting it after copy block field by field transformation. Now we keep the flag and it blocks VN.Such transformation is correct because we replace structs with primitive types (because there are no struct field support) and we check field types here:
runtime/src/coreclr/jit/morphblock.cpp
Line 871 in e0f6071
so these regressions don't fix any issues but avoiding them would require more changes and it is RC1 so probably it is better to keep it conservative.