-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failed: (GetComponentSize() <= 2) || IsArray() #86273
Comments
Assert hit in #98744 - Libraries Test Run checked coreclr linux_musl arm Release, System.Runtime.Tests |
Any update on this one? Another hit in https://dev.azure.com/dnceng-public/public/_build/results?buildId=578408&view=ms.vss-test-web.build-test-results-tab&runId=13879038&resultId=197184&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab (net9.0-linux-Release-arm-CoreCLR_checked-jitstress_random_1) |
@mangod9 Milestone Future is not appropriate for this continuing failure. I marked it as 9.0. |
Ok, will take a look. Looks like it has recently started occurring in regular CI and not just JITStress |
from what I see, the dumps collected dont seem to be too useful. @tommcdon fyi. |
It looks like the compiler generates broken unwind info for
|
Contributes to dotnet#86273
* Disable NORETURN annotation on PROCAbort for Arm Contributes to #86273 * Fix build break
Another hit:
net9.0-linux-Release-arm-jitstressregs1 @jkotas apparently your fix didn't cover all cases? |
(net9.0-linux-Release-arm-jitstress2_jitstressregs8) |
My fix was meant to address #86273 (comment) and make the dumps diagnosable. The fix works as expected as far as I can tell. The debugger produces good stacktraces now. This should unblock further investigation. Crash stacktrace from https://dev.azure.com/dnceng-public/public/_build/results?buildId=607034&view=ms.vss-test-web.build-test-results-tab
|
This crashed while stackwalking aborted
|
Thanks for looking into it Jan. @VSadov, Since this appears to be a heap corruption and you were recently trying to repro an arm32 issue recently -- could you please check if this repros with these settings: #86273 (comment). Also adding @janvorli in case you had seen these during your exceptions gc-hole investigations |
I will take a look. It looks like ThreadAbort specific though and not with GCStress, so it may be quite different from things I was looking at recently. |
This is not arm32-specific. There are failures on x64 as well. (in Work Item System.Threading.Channels.Tests)
|
This assert is a general symptom of a GC hole, bad GCInfo or corrupted GC heap. It can have many different root causes. I think it is very likely that the root cause for the arm32 failure in System.Runtime tests is different from the root cause for the System.Threading.Channels.Tests failure. |
I've changed the error to be Confusingly, the original report is for |
Yeah agree, the original error is possibly no longer occurring. Note that it was from over a year ago. |
I am able to reproduce this locally |
It looks like we are reporting junk to GC in register R4 The crash does not require Is there a way to do JitDisasm for R2R methods? (on arm32) Forcing JIT to emit code similar to R2R would work too as then I could just do regular JitDisasm |
yeah R2RDump should be able to disasm them. Adding @cshung since ILSpy would also work here. |
With |
It is probably best to get a jitdump, something you can capture if you turn on some switches and crossgen again. Thanks to determinism you should always get the same code. With the JIT dump, look at register allocation to see why the JIT believe R4 has a GC reference at that point while it is junk. |
This is for the JIT code (with G_M33701_IG06: ; bbWeight=1, gcVars=0000000100010002 {V00 V02 V05}, gcrefRegs=0030 {r4 r5}, byrefRegs=0000 {}, gcvars, byref
; gcrRegs +[r4-r5]
; GC ptr vars +{V00 V01 V02 V05 V16 V32}
000086 movw r6, 0x8141
00008A movt r6, 0xf2ac
00008E blx r6 // System.Environment:get_CurrentManagedThreadId():int
; gcr arg pop 0
000090 dmb 15
000094 str r0, [r5+0x24]
000096 movs r6, 0
000098 str r6, [sp+0x08] // [V03 loc1]
; GC ptr vars +{V03}
/=== GC happens when we are about to execute the next instruction (offset 00009A).
V if we'd come from the above, R4 would be live (although unused), but if we branched from below, R4 contains junk.
label G_M33701_IG07 seems to be killing R4, so why is it reported at 00009A?
;; size=20 bbWeight=1 PerfScore 7.00
G_M33701_IG07: ; bbWeight=8, gcVars=0000000100410002 {V00 V02 V03 V05}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, isz
; gcrRegs -[r4-r5]
; GC ptr vars -{V01 V16 V32}
00009A ldr r5, [sp+0x0C] // [V02 loc0]
; gcrRegs +[r5]
00009C add r4, r5, 40
; byrRegs +[r4]
0000A0 cmp r4, 0
0000A2 beq SHORT G_M33701_IG20
0000A4 mov r0, r4
; byrRegs +[r0] |
so feels like a codegen issue? |
Not completely sure. Another confusing part is that this code pattern does not seem overly uncommon, so why we see a failure only here? |
The issue is understood.
We are using GC info for an instruction that is different from where we are (by |
Ideally the GC info would be contiguous around calls, at least in terms of nonvolatile registers. (kind of a definition of nonvolatile that calls do not trash them). This is a bigger issue though, that may be addressed in #95565 In this particular case, I think, we can ignore the bigger issue for a bit longer and just pull a piece from #95565 that addresses continuity at throwing calls. Having that, we would not need to adjust in throwing cases regardless of leaf/nonleaf or whether the method is interruptible. I am testing a fix. |
Also - it does not look like ARM32-specific. |
The root cause for this issue should be fixed now. Note: we may still see If scenario does not involve |
@VSadov, this assert happened in runtime-coreclr superpmi-collect pipeline in System.Security.Cryptography.Tests Work Item. Could you please take a look? |
If this does not involve |
There are many jobs in https://dev.azure.com/dnceng/internal/_build?definitionId=977&_a=summary queue. Most jobs are failing, which is not surprising since the pipeline is triggered on PRs and PRs often have failures. I've checked a few jobs, but can't see one that failed with @JulieLeeMSFT - do you have a link to the actual job which failed this way? |
There is also a bunch of asserts like the following:
I think these are more indicative of what went wrong. I will open a separate bug for that. |
Actually there is already bug for that: |
Thanks. We will check what went wrong. |
Hi @JulieLeeMSFT , |
net8.0-linux-Release-arm-CoreCLR_checked-jitstress2_jitstressregs1-(Ubuntu.1804.Arm32.Open)Ubuntu.1804.Armarch.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7
System.Memory.Tests Work Item
https://dev.azure.com/dnceng-public/public/_build/results?buildId=272702&view=ms.vss-test-web.build-test-results-tab&runId=5384424&paneView=debug&resultId=195466
Known Issue Error Message
Fill the error message using step by step known issues guidance.
Known issue validation
Build: 🔎⚠️ Validation could not be done without an Azure DevOps build URL on the issue. Please add it to the "Build: 🔎" line.
Result validation:
Validation performed at: 3/22/2024 4:10:23 PM UTC
Report
Summary
The text was updated successfully, but these errors were encountered: