-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for field layout verification across version bubble boundary #50364
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. We don't need a new encoding as we've never shipped anything that should depend on this.
@davidwrighton - CI testing uncovered a pre-existing deficiency of CG2 field layout, we weren't properly aligning base class sizes on ARM32. Could you please take another look when you have a chance? |
OK, I found out there were actually three different places in Crossgen2 - auto field layout calculation, sequential field layout calculation and the field offset verification fixup - that were using three different methods to calculate the field base offset. I have unified all of them to use the same method |
Every time anyone looks at this stuff we find more cases to consolidate. Hopefully this is it. I like the current state of the change, but of course, we need to look at the results of the outerloop testing that you are doing now. |
@@ -775,13 +762,14 @@ private static bool IsByValueClass(TypeDesc type) | |||
return type.IsValueType && !type.IsPrimitive && !type.IsEnum; | |||
} | |||
|
|||
private static LayoutInt ComputeBytesUsedInParentType(DefType type) | |||
public LayoutInt CalculateFieldBaseOffset(TypeDesc type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API will crash with an InvalidCast if e.g. an array is passed. It would be better to ensure valid inputs at compile time.
public LayoutInt CalculateFieldBaseOffset(TypeDesc type) | |
public LayoutInt CalculateFieldBaseOffset(MetdataType type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks, fixed in 8th commit.
8d19e62
to
eac8901
Compare
I have rebased the change, I reverted a failed attempt at ARM offset bias refactoring; for the sake of expediency I have also included my two other pending fixes, and For now I assume that we'll first merge in the two fixes and I'll then rebase this change against them and rerun the testing but I'm open to merging in this combination of the three fixes for the sake of expediency. |
merging together should be fine, assuming tests are passing with all of them together. |
Down to three remaining test failures (ExplicitClass in crossgen2smoke on ARM, LayoutClassTest, PlatformDefaultMemberFunctionTest); some unrelated failures in composite CG2 builds that should be fixed or at least change their behavior after the fix for MVID checks (#50474) is merged in; I'm looking at them now but the crossgen2smoke failure on arm[32] is interesting and I would love to hear @davidwrighton's opinion: Verify_FieldOffset 'ExplicitlySizedClass.A' Field offset 0!=-4(actual) || baseOffset 0!=0(actual) I think this raises two interesting questions:
Thanks Tomas |
Is the base validation actually useful?
|
I believe it's extremely useful. Please remember that these checks are not supposed to improve our code in any manner, they are just sanity checks helping us chase down subtle differences in the two type systems and class loaders that have been plaguing us throughout the entire CG2 development - each longer-term member of the CG2 v-team has their stories to tell; I side with David on this one, I believe the validation has been an awesome boost in nailing down hard issues, I recall times before that change and having to do everything manually. And this logic already bears unexpected fruits - discoveries of tiny misalignments between CG2 and the runtime regarding field layout, not only on ARM (even within a version bubble i.e. with fixed offsets). While this PR is difficult, I would be very sad to have to trim this down by reducing the quality of the checks. |
Hmm, I guess the expression I used is somewhat ambiguous, sorry about that, I think I meant to say something like "the checks are not supposed to improve the runtime native code in any manner, ...", the way I phrased that it sounds like a contradiction in terms to some extent. Thanks for your patience :-). |
I understand that the field offset check is useful. I am questioning the usefulness of the base check. |
The particular value of the runtime base check isn't to validate them in cases where there is a cross-module boundary. Its to validate the data that would be used in a cross module boundary to ensure that the general type layout processing is correct. It turns out that while not exactly matching up to a useful value, it has been quite valuable at finding all kinds of edge cases that do affect the within version bubble cases. In particular, it is good at finding cases where the alignment calculations were subtly wrong in a way that didn't cause a field offset to be wrong in that particular test case, but could have caused a problem for other fields on the type. Now, if it turns out the only incorrectness is caused by this incorrect conflation of cross boundary and non-deriving, we could use a different sentinel value like 0xFFFFFFFF or something or simply not allow a sentinel value for types that derive directly from object. @trylek I assume that you have an arm32 test environment where you can test this locally? Relying on the CI for this sort of testing is a giant time waster. Also, what I've found in the past is that when a layout issue is particularly nasty is that I need to build a standalone repro of the bug, and debug in parallel through the native CLR type layout logic and the managed type layout logic and find where they differ. I worry that when you consolidated the 3 different ways in which we calculated base offset that you actually removed a way in which they are required to be very subtly different. |
Hmm, I guess you're probably right that in the local case (within version bubble) the verification of absolute field offset is sufficient. On the other hand, the base class - derived class boundary turns out to be a super productive bug farm, I have been poring over the code for more than a week by now and I still cannot claim I exactly know what's happening in each situation, especially as there are multiple code paths involved for explicit / sequential / auto layouts for reference types vs. value types with side fun regarding the ARM32 offset bias. As I believe that "usefulness" of this fixup is basically preventively diagnostic (we detect upfront that we got it wrong), I tend to believe that the base field offset calculation does constitute a value worth validating (in the intra-bubble scenario) as getting that wrong indicates a potentially critical bug in our implementation. |
In fact, this is exactly what crashes the LayoutClassTest - in the test, there's a class named
The proximate problem is that, on x64, we don't properly replicate runtime behavior w.r.t. the base class |
cf657fe
to
b5d2731
Compare
When verifying field offset consistency, we shouldn't be checking base class size when the base class doesn't belong to the current version bubble. I have fixed this by adding a special case for the existing fixup encoding indicated by base class size being zero; please let me know if you find it preferable to create a new fixup type for this purpose instead. I have also created a simple regression test that was previously failing when run against the framework compiled with CG2 assembly by assembly. Thanks Tomas
1cf876a
to
943cd4d
Compare
Merging in as the PR is green and the remaining CG2 failures are known. |
* upstream/main: (568 commits) [wasm] Set __DistroRid on Windows to browser-wasm (dotnet#50842) [wasm] Fix order of include paths, to have the obj dir first (dotnet#50303) [wasm] Fix debug build of AOT cross compiler (dotnet#50418) Fix outdated comment (dotnet#50834) [wasm][tests] Add properties to allow passing args to xharness (dotnet#50678) Vectorized common String.Split() paths (dotnet#38001) Fix binplacing symbol files. (dotnet#50819) Move type check to after the null ref branch in out marshalling of blittable classes. (dotnet#50735) Remove extraneous CMake version requirement. (dotnet#50805) [wasm] Remove unncessary condition for EMSDK (dotnet#50810) Add loop alignment stats to JitLogCsv (dotnet#50624) Resolve ILLink warnings in System.Diagnostics.DiagnosticSource (dotnet#50265) Avoid unnecessary closures/delegates in Process (dotnet#50496) Fix for field layout verification across version bubble boundary (dotnet#50364) JIT: Enable CSE for VectorX.Create (dotnet#50644) [main] Update dependencies from mono/linker (dotnet#50779) [mono] More domain cleanup (dotnet#50771) Race condition in Mock reference tracker runtime with GC. (dotnet#50804) Remove IAssemblyName (and various fusion remnants) (dotnet#50755) Disable failing test for GCStress. (dotnet#50828) ...
When verifying field offset consistency, we shouldn't be checking
base class size when the base class doesn't belong to the current
version bubble. I have fixed this by adding a special case for
the existing fixup encoding indicated by base class size being
zero; please let me know if you find it preferable to create
a new fixup type for this purpose instead. I have also created
a simple regression test that was previously failing when run
against the framework compiled with CG2 assembly by assembly.
Thanks
Tomas
Fixes: #49982
/cc @dotnet/crossgen-contrib