-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failure JIT\\opt\\ObjectStackAllocation\\ObjectStackAllocationTests\\ObjectStackAllocationTests.cmd #81103
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsFailed in Run: runtime-coreclr outerloop 20230124.1 and many ohter outerloop runs Failed tests:
Error message:
Stack trace:
|
@AndyAyersMS, it is blocking outerloop for many days. Please take a look. |
Failure history above seems to implicate this commit range but nothing there looks all that likely: [Note we have since determined the issue is with the crossgen 2 clr host, which is using older bits] 2038514 Fix Configuration to ensure calling the property setters. (#80438) |
Can't repro this locally or with runfo bits. From a recent core dump it looks like it is a crash in the dotnet used to run crossgen2, which is 8.0.0-alpha.1.23058.2\coreclr.dll from hash 5da4a9e I can't find symbols for this yet... |
Ok, found them (thanks
@mangod9 @janvorli does this seem at all familiar? I can look at a few more dumps to see if they're all consistent. |
I am looking into it. |
Added more stack trace to the above -- this is happening right at startup:
|
There is something weird going on in this loop: runtime/src/coreclr/vm/precode.cpp Lines 479 to 488 in 9b76c28
The temporaryEntryPoints is 0x00007ffd86116000, the count should be 0xaa (that I have dug out in the coreclr!MethodTableBuilder::SetupMethodTable2 frame from the pChunk ) and the oneSize is 0x18. In the FixupPrecode::Init , we are accessing the data of those entry points that are one page after that address, so they start at 0x00007ffd86117000. That means that at the end of the loop, we should reach 0x00007ffd86117ff0. But we are crashing since we went to 0x00007ffd86118000, which is the next page and it is not mapped (the fact that it is not mapped is expected).
|
The issue is not in the source code tree state that is being tested but rather in the source code tree state of the runtime used by crossgen2, which is the 8.0.0-alpha.1.23058.2\coreclr.dll from hash 5da4a9e as Andy has mentioned. |
Not sure if this is what you meant -- in the dump I'm looking at the faulting address is mapped/reserved but not committed:
From what I can see all the failures are on windows, mostly x64 but some on arm64. Maybe some kind of off by one setting up initial allocations that (for reasons unknown) only hits this particular test? Seems quite odd because crossgen2 has to run hundreds of times during this test pass, and only this invocation fails, and (I suspect) doesn't always fail. |
Yes, the page should be reserved, but not committed. When we allocate something from the stub heap, we always commit two pages. One RX followed by one RW. The RX holds the stubs code, the RW their data. The data is always at the address of the code + page_size. Here we go over the end of those two pages while we should not. It is also not clear to my why it fails with just one test yet, especially at the init time where nothing app specific is loaded yet. |
@AndyAyersMS this specific test run actually differs from others - the |
Although, that doesn't make sense, as the option would be processed only after the crossgen2 code was executed and the crash happens during coreclr initialization. |
The test sets several DOTNET_ environment variables: runtime/src/tests/JIT/opt/ObjectStackAllocation/ObjectStackAllocationTests.csproj Lines 12 to 18 in aa5e313
Maybe these are being set for the crossgen2 invocation as part of the test wrapper? |
Ah, so these would likely be set for the crossgen2 execution too. |
Ah right -- thanks Jakob. With those set I can repro. Problem seems to be with the pair
|
Thanks @AndyAyersMS and @jakobbotsch, let me try to debug it live. |
With just those two set you can't launch crossgen2 at all (fails with no command line args)
|
I understand what's going on. The |
When the new stub heaps were implemented, the chunk size needed to be limited so that all precodes for a chunk fit into a single memory page. That was done in `MethodDescChunk::CreateChunk`. But it was discovered now that there is another place where it needs to be limited, the `MethodTableBuilder::AllocAndInitMethodDescChunk`. The JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail in the outerloop due to too long chunk. The failure happens in crossgen2 as it is using a separate build of runtime in release and only that uncovers the problem. And only when DOTNET_TieredCompilation=0 and DOTNET_ProfApi_RejitOnAttach=0. This change fixes the problem. With this change applied to the dotnet runtime version used by the crossgen2 and patching it in place in the .dotnet/shared/.... directory, the issue doesn't occur. Close dotnet#81103
@janvorli thanks for getting to the bottom of this one. |
Seems like we'll also need to disable this test and wait for a dotnet update to re-enable, right...? Fixing this in main won't completely resolve the issue as crossgen2 will still be using an older runtime. |
Ah, that's right. |
Disable the test until a dotnet with a fix of the issue dotnet#81103 is used in the runtime repo to build / run tests.
Alternative fix to just disabling the test would be to add the problematic environment variables to the adhoc list crossgen2 testing already suppresses: runtime/src/tests/Common/CLRTest.CrossGen.targets Lines 273 to 278 in eaf0bb2
|
That's something that we should probably do too, it doesn't seem to make sense to run crossgen with the settings that are meant for the actual test run. But let's keep it as is for now, it will allow us to double check that the issue is gone after my fix propagates to the dotnet runtime used for build / crossgen execution. |
I don't know for how long we're going to run crossgen2 on top of the JIT-based CoreCLR in testing. There have been thoughts to use the shipping configuration, which is crossgen2 compiled with NativeAOT. So we're going to lose this as a "regression test" eventually. |
When the new stub heaps were implemented, the chunk size needed to be limited so that all precodes for a chunk fit into a single memory page. That was done in `MethodDescChunk::CreateChunk`. But it was discovered now that there is another place where it needs to be limited, the `MethodTableBuilder::AllocAndInitMethodDescChunk`. The JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail in the outerloop due to too long chunk. The failure happens in crossgen2 as it is using a separate build of runtime in release and only that uncovers the problem. And only when DOTNET_TieredCompilation=0 and DOTNET_ProfApi_RejitOnAttach=0. This change fixes the problem. With this change applied to the dotnet runtime version used by the crossgen2 and patching it in place in the .dotnet/shared/.... directory, the issue doesn't occur. Close #81103
When the new stub heaps were implemented, the chunk size needed to be limited so that all precodes for a chunk fit into a single memory page. That was done in `MethodDescChunk::CreateChunk`. But it was discovered now that there is another place where it needs to be limited, the `MethodTableBuilder::AllocAndInitMethodDescChunk`. The JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail in the outerloop due to too long chunk. The failure happens in crossgen2 as it is using a separate build of runtime in release and only that uncovers the problem. And only when DOTNET_TieredCompilation=0 and DOTNET_ProfApi_RejitOnAttach=0. This change fixes the problem. With this change applied to the dotnet runtime version used by the crossgen2 and patching it in place in the .dotnet/shared/.... directory, the issue doesn't occur. Close #81103
When the new stub heaps were implemented, the chunk size needed to be limited so that all precodes for a chunk fit into a single memory page. That was done in `MethodDescChunk::CreateChunk`. But it was discovered now that there is another place where it needs to be limited, the `MethodTableBuilder::AllocAndInitMethodDescChunk`. The JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail in the outerloop due to too long chunk. The failure happens in crossgen2 as it is using a separate build of runtime in release and only that uncovers the problem. And only when DOTNET_TieredCompilation=0 and DOTNET_ProfApi_RejitOnAttach=0. This change fixes the problem. With this change applied to the dotnet runtime version used by the crossgen2 and patching it in place in the .dotnet/shared/.... directory, the issue doesn't occur. Close #81103
When the new stub heaps were implemented, the chunk size needed to be limited so that all precodes for a chunk fit into a single memory page. That was done in `MethodDescChunk::CreateChunk`. But it was discovered now that there is another place where it needs to be limited, the `MethodTableBuilder::AllocAndInitMethodDescChunk`. The JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests started to fail in the outerloop due to too long chunk. The failure happens in crossgen2 as it is using a separate build of runtime in release and only that uncovers the problem. And only when DOTNET_TieredCompilation=0 and DOTNET_ProfApi_RejitOnAttach=0. This change fixes the problem. With this change applied to the dotnet runtime version used by the crossgen2 and patching it in place in the .dotnet/shared/.... directory, the issue doesn't occur. Close #81103 Co-authored-by: Jan Vorlicek <janvorli@microsoft.com>
Failed in Run: runtime-coreclr outerloop 20230124.1 and many ohter outerloop runs
Failed tests:
Error message:
Stack trace:
The text was updated successfully, but these errors were encountered: