-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failure: Regressions/coreclr/GitHub_35000/test35000/test35000.cmd #113106
Comments
Failed in: runtime-coreclr outerloop 20250304.4 Failed tests:
Error message:
Stack trace:
|
Would be nice if we had line info here. I think that means CoreCLR tests are missing symbols? cc @hoyosjs I pulled the helix payload and ran locally - here's what I see Here's a dump https://microsoft-my.sharepoint.com/:u:/p/ericstj/EZQCl-QQHKJNkLSw6caTPxAB4axXj2z-xZPPrMDwjdPPFA?e=Svx2hf
|
Correction - the test expects the NRE when it's invoking an instance method without providing the instance. The problem here is that it only null-ref'ed once instead of 10 times. @steveharter could this be related to invoke work?
|
Failed in: runtime-coreclr outerloop 20250305.2 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr jitstress 20250305.1 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr ilasm 20250308.1 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr r2r-extra 20250309.1 Failed tests:
Error message:
Stack trace:
|
This could be related to recent x86 EH related changes in the JIT. Rerunning jitstress now to see if we've fixed this via #113330. https://dev.azure.com/dnceng-public/public/_build/results?buildId=976183&view=results |
Failed in: runtime-coreclr r2r 20250310.1 Failed tests:
Error message:
Stack trace:
|
Works with JitMinOpts=1, so very likely a jit issue. |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Implicated commit range is 6d344b3 Use minipal_getcpufeatures to detect for AVX (#113032) |
Enabling the AMD64-specific NOP padding fix on x86 fixes this failure for me. I can open a PR for this, though I ought to check with @janvorli that this is the right thing to do. |
Failed in: runtime-coreclr jitstress 20250311.2 Failed tests:
Error message:
Stack trace:
|
I tried enabling EH logging but it's not verbose enough to spot the problem. It seems almost like the runtime thinks the method has no EH at all or is skipping the frame... at least I don't see it enumerating and rejecting the clauses.
|
Failed in: runtime-coreclr gcstress-extra 20250312.1 Failed tests:
Error message:
Stack trace:
|
@janvorli would appreciate if you could help us figure out whether the proposed fix (nop padding for a call at the end of a try with a disjoint catch) is the right fix. |
@AndyAyersMS I am aware of this issue, but since it is x86 stuff that I haven't seen for quite some time, I need to refresh my memory on that. I hope to be able to look into it later today or on Monday the latest. |
SPMI collect pipelines failed due to this issue. 20250309.1
|
If the instruction pointer is out of a try region range, EH would not consider the clauses belonging to that try range. Do I understand it correctly that before there was a jmp after the call that was covered by the try range and now there is nothing and the try region just ends with the call? |
That's correct. It sounds like the fix can be pretty targeted: If the last block in a try region ends with a call instruction, and that block falls into the next block, we need to emit a NOP. @janvorli do we need to handle any of the other cases we handle on x64? For example, if a catch region ends with a call, or if a try region has a call instruction right before entering a nested try region, etc. |
Have you tried to dump the EH info for the method? Using sos, you can disass the method and include ehinfo using runtime/src/coreclr/vm/i386/excepx86.cpp Lines 2371 to 2375 in 7bffa54
So, offs that is equal to TryEndPC is considered to be in the try range unless the end_adjust is set, which happens when the current method is on top of the stack, that means it is the one currently being executed. In other words, if the throw was right in that method and not in a callee. The offs is a diff between a possibly adjusted Eip and the start address of the method. That adjustment should move the Eip back by 1 in case it is not in the topmost frame. As you can see, there are two adjustments happening here.But overall it seems to me that adding a NOP should not be needed and it should work thanks to the adjustments. I need to step through the code in the crashing test to see what's going on w.r.t. the offsets. |
What are the DOTNET_XXX settings necessary to reliably repro the issue? I have tried without any and then with DOTNET_TieredCompilation=0 and it didn't repro for me. |
Ah, maybe I need a checked build, I've tried it with debug |
Yeah this repros for me with checked build and no DOTNET settings. |
I was able to repro it, there is a bug, adding the NOP would just hide it. There are two things that we decoded incorrectly and that lead to the fact that we don't take the offset after the call into account:
|
Failed in: runtime-coreclr ilasm 20250315.1 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr r2r-extra 20250316.1 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr r2r 20250317.1 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr jitstress 20250318.1 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr jitstress 20250319.1 Failed tests:
Error message:
Stack trace:
|
When a NULL reference exception occurs in a JIT helper or a VSD stub, runtime pretends the exception occured in the managed caller. There is a bug on x86 Windows where the COMPlusThrowCallback considers that frame to be the frame where the exception actually occurred (based on the m_crawl.isFirst). In case the call to the helper is the last instruction in a try region, the exception handler lookup would reject that address and the exception may not get handled at the right place or at all. This change fixes it by ensuring that the m_crawl.isFirst is not set when the frame is not the frame of the failure. Close dotnet#113106
Failed in: runtime-coreclr ilasm 20250322.1 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr outerloop 20250303.5
Failed tests:
Error message:
Stack trace:
The text was updated successfully, but these errors were encountered: