-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI failure: System.Globalization.Tests.InvariantModeTests.TestLastIndexOf, object reference not set to an instance of an object #74179
Comments
Tagging subscribers to this area: @dotnet/area-system-globalization Issue DetailsDidn't find any open issues reporting this. Found in this RC1 PR: #74045 Callstacks:
|
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsDidn't find any open issues reporting this. Found in this RC1 PR: #74045 Callstacks:
|
This looks a regression from #73768? |
Just mono though? @lambdageek @vargaz |
This other test in the same CI result seem to have this bug as root cause:
|
I assume this is failing on |
@lateralusX do you own next action? I assume this means IndexOf itself is a bit broken on Mono. |
I take a look and see what needs to be done, it's probably related to x64 Windows value type ABI implementation on Mono. |
Have found the underlying issue causing the AV. It might not be Windows specific, but due to the way the test uses BoundedMemory.MakeReadonly, it will add guard pages around the string on Windows. The generated code IndexOfAnyValueType for char/int16, is incorrectly optimized on release builds to do a cmp dword ptr [rax], 0, so in the case it access the last char in the buffer, reading a dword will bump in to the no access guard page, causing the AV. The following code reads the correct amount of data, movsx rax, word ptr[rax] and it appears that the cmp is not really needed but a left over, probably due to inlining optimization, since this only happens on release builds, but work on debug. The issue is most likely not reproducible on regular code, but it will read 2 bytes pass end of buffer so could in theory cause random crashes, the result of the cmp is not really used, so it won't affect the logic, meaning that it can go undetected. I need to dig more into this to figure out why and where we add this additional incorrect cmp instruction and correct it. Until I'm able to get to that point, #74433 will disable the test for now. |
cc: @GrabYourPitchforks (a win for BoundedMemory) |
…ests (#74454) * [mono] Disable failing Globalization and Transactions tests The System.Transactions.Tests.OleTxTests aren't supported by Mono on Windows: #74187 The System.Globalization test failure is a real issue that is being investigated: #74179 * Disable InvariantMode tests as well Co-authored-by: Alexander Köplinger <alex.koeplinger@outlook.com>
The crash happens in Mono when we execute this code, part of while (length > 0)
{
length -= 1;
if (TNegator.NegateIfNeeded(Unsafe.Add(ref searchSpace, offset).Equals(value))) return (int)offset;
offset -= 1;
}
Looking into a fix where we change the implementation of null checks on platforms reading more than 1 byte, into an implementation that only read 1 byte, making sure we won't read passed allocated buffer. On x64 we would then change it from:
to:
Switching to a reg instead of imm 0 on x64 will also reduce the encoded instruction with 1 byte (already done this way on x86). LLVM also reads to much in the null check:
will look into changing that into:
Another thing worth pointing out, most of the code in
that will trigger a call to
to:
and eliminate the complete virtual call. We should still fix the null check implementation, but optimizing the equal check also makes sense. It is implemented like that in a couple of places in SpanHelpers.T.cs that would suffer similar issues. One last comment/note around managed pointers, reading up on ECMA - 335 it says: "Managed pointers cannot be null, and they shall be reported to the garbage collector even if they do not point to managed memory" @lambdageek, when we discussed that yesterday, you mentioned that they actual can be null meaning that we should emit the needed null check as we currently do, but looking at the spec, maybe that is too defensive and we waste instructions? Thoughts? I have fix for x86/x64 done, needs to run more on the LLVM fix before submitting a PR. |
"Managed pointers cannot be null" sentence is removed in https://github.com/dotnet/runtime/blob/main/docs/design/specs/Ecma-335-Augments.md |
And most likely using the equality check would introduce lower risk compared to fixing the codegen? Which would make it easier to backport the fix to 7? |
Indeed, but I think we might need the codegen fix as well since that could potentially hit us in other scenarios, virtual call on managed pointers, pointing to something < 4 bytes. It will also trigger unaligned reads, so there is a small hidden perf cost. Having that said, the codegen issue has been around for sometime and just recently exposed by changes to SpanHelpers.T.cs and use of BoundedMemory testing it, so it could be argueed to split the fix into a PR fixing the SpanHelpers.T.cs and one handling the codegen fixes. I can work on both since I'm in the process working on the codegen fixes at the moment. |
I'd prefer that so we can backport to 7.0 and reenable the tests. |
Sounds good, I will add a PR that fix SpanHelpers.T.cs and re-enable the test that could be backported and a separate PR with the codegen fix that we can start out in main and then decide if we should backport or not. |
Workaround crash hit by dotnet#74179 making sure we avoid hitting codepath emitting this null pointer checks. The full fix includes codegen fixes as well, but will be performed in separate PR. There are still locations in SpanHelper.T.cs that uses Equal virtual call on value types that could be managed pointers to value types, but that code has remained the same for the last 4 years to 15 months and have not hit this issue in the past.
Tests are disabled, removing blocking-clean-ci label |
Workaround crash hit by #74179 making sure we avoid hitting codepath emitting this null pointer checks. The full fix includes codegen fixes as well, but will be performed in separate PR. There are still locations in SpanHelper.T.cs that uses Equal virtual call on value types that could be managed pointers to value types, but that code has remained the same for the last 4 years to 15 months and have not hit this issue in the past.
… where possible. (#74567) * Use System.Numerics.IEqualityOperators.op_Equality in SpanHelper.T.cs. Workaround crash hit by #74179 making sure we avoid hitting codepath emitting this null pointer checks. The full fix includes codegen fixes as well, but will be performed in separate PR. There are still locations in SpanHelper.T.cs that uses Equal virtual call on value types that could be managed pointers to value types, but that code has remained the same for the last 4 years to 15 months and have not hit this issue in the past. * Re-enable globalization tests disabled in #74433.
Current implementation of OP_CHECK_THIS on x86/x64 and LLVM does a memory read of at least 4 bytes. This creates an issue when the target is a managed pointer, since that could point to the interior of a type, meaning it can read pass the allocated memory causing a crash. Fix change the size of the read to one byte since the only reason doing the read is to validate that the reference, managed pointer is not NULL. Reading only one byte is also inline with how it is implemented on arm/arm64, and it will reduce potential unaligned reads on x86/x64. Full fix for, #74179.
Reopening to track backport to 7.0 in PR #74738 |
…panHelper.T.cs where possible. (#74738) * Use System.Numerics.IEqualityOperators.op_Equality in SpanHelper.T.cs. Workaround crash hit by #74179 making sure we avoid hitting codepath emitting this null pointer checks. The full fix includes codegen fixes as well, but will be performed in separate PR. There are still locations in SpanHelper.T.cs that uses Equal virtual call on value types that could be managed pointers to value types, but that code has remained the same for the last 4 years to 15 months and have not hit this issue in the past. * Re-enable globalization tests disabled in #74433. Co-authored-by: lateralusX <lateralusx.github@gmail.com>
merged |
Current implementation of OP_CHECK_THIS on x86/x64 and LLVM does a memory read of at least 4 bytes. This creates an issue when the target is a managed pointer, since that could point to the interior of a type, meaning it can read pass the allocated memory causing a crash. Fix change the size of the read to one byte since the only reason doing the read is to validate that the reference, managed pointer is not NULL. Reading only one byte is also inline with how it is implemented on arm/arm64, and it will reduce potential unaligned reads on x86/x64. Full fix for, dotnet#74179.
) Current implementation of OP_CHECK_THIS on x86/x64 and LLVM does a memory read of at least 4 bytes. This creates an issue when the target is a managed pointer, since that could point to the interior of a type, meaning it can read pass the allocated memory causing a crash. Fix change the size of the read to one byte since the only reason doing the read is to validate that the reference, managed pointer is not NULL. Reading only one byte is also inline with how it is implemented on arm/arm64, and it will reduce potential unaligned reads on x86/x64. Full fix for, #74179.
) Current implementation of OP_CHECK_THIS on x86/x64 and LLVM does a memory read of at least 4 bytes. This creates an issue when the target is a managed pointer, since that could point to the interior of a type, meaning it can read pass the allocated memory causing a crash. Fix change the size of the read to one byte since the only reason doing the read is to validate that the reference, managed pointer is not NULL. Reading only one byte is also inline with how it is implemented on arm/arm64, and it will reduce potential unaligned reads on x86/x64. Full fix for, #74179. Co-authored-by: lateralusX <lateralusx.github@gmail.com>
Affected tests:
Total hits per run: 291x entries with 279x failures per run
Didn't find any open issues reporting this.
Please help determine if this needs a fix for RC1.
Found in this RC1 PR: #74045
Occurred in the
runtime-extra-platforms
run.Leg:
Build windows x64 Release AllSubsets_Mono
Job: https://dev.azure.com/dnceng/public/_build/results?buildId=1951942&view=logs&j=585f1246-0618-5d47-ff3e-08c83309e2d2&t=5e10daf8-cc46-5bd4-2ab7-d28eafd857b5&l=89
Callstacks:
The text was updated successfully, but these errors were encountered: