-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AV in WKS::gc_heap::relocate_address #34279
Comments
Crash while stackwalking |
@fadimounir This looks like a duplicate of #31809 |
Thanks @jkotas, I will take a look. Are there crash dumps in the failures since I wasn't able to repro this locally? |
I have saved the dump and binaries at \jkotas9\drops\issue-34279 so that it won't get lost. |
I took a look at the dump, and it looks like we're enumerating the gs cookie of the InlinedCallFrame as a gcref object. Still not sure why |
@jkotas I've been able to trace the root cause of this issue. The problem is here: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/i386/PInvokeStubs.asm#L58 We're using the return address from the call to the Anyways, I did test under debugger what the correct behavior would have been by manually patching the return address in the I think the fix is to pass the correct return address value as a parameter to the @dotnet/jit-contrib I need some help figuring out how to do that. |
BTW, the bug exists on all architectures, all platforms, and not just on Windows-x86. It's just extremely rare that it would ever fail |
That is one option. Other option is to fix the GCInfo reporting so that the GCInfo at the |
@AndyAyersMS could you please forward this issue to someone on your team for a fix? Thanks! |
@fadimounir do you still have the GC info reported for the method? Looking at the jit code I don't see offhand how the GC state could change between the PINVOKE_BEGIN and the actual pinvoke. |
This is specific to R2R codegen. If you would like to debug it under JIT, you can set The GC Info looks like this:
The problem is the |
I am trying to understand why we are doing the GC reporting for Thinking about consequences of adding yet another argument for the |
In the case I'm looking at we currently don't emit anything for the pinvoke call because of this early out in runtime/src/coreclr/src/jit/emit.cpp Lines 6078 to 6087 in 4335c82
|
If I disable the early bail out then we see reports for the pinvoke and the pivoke end calls:
I don't understand how the early bail out is safe. Seems like the logic should be tracking whether the call we're looking at has different GC reporting than the previous call, or something like that. |
I thought that the way that partially interruptible GC reporting worked on x86 was that if nothing is recorded for a particular offset then nothing get reported as being alive. It should not be reporting what it saw alive at a earlier callsite. Fully interruptible does keep updating the state and reports what it has incrementally computed as being alive. But that isn't how partially interruptible works. |
This is why the pinvoke begin gc descriptor has a nonzero arg mask, yes. Maybe the issue is that the pinvoke args are byrefs and not native ints?
Seems like the IL or the jit could cast these to non-gc types. |
What is the signature of the callsite? I believe that is what determines if it reports a byref or not. |
Yes, good point. The ECMA specs says:
I guess that we can start with declaring this to be invalid IL, and fix the IL stub generation accordingly. |
But it would be also useful to add at least an assert to the JIT that fires when GC tracked value is passed into PInvoke. |
Looks like the pinvoke call site sig has nativeint, but the jit doesn't do anything with this particular mismatch (see Seems like this mostly happens with pinvokes that take byrefs. Assert is something like: @@ -6737,9 +6737,16 @@ void Compiler::impPopArgsForUnmanagedCall(GenTree* call, CORINFO_SIG_INFO* sig)
assert(thisPtr->TypeGet() == TYP_I_IMPL || thisPtr->TypeGet() == TYP_BYREF);
}
- for (GenTreeCall::Use& use : GenTreeCall::UseList(args))
+ for (GenTreeCall::Use& argUse : GenTreeCall::UseList(args))
{
- call->gtFlags |= use.GetNode()->gtFlags & GTF_GLOB_EFFECT;
+ // We should not be passing gc typed args to an unmanaged call.
+ GenTree* arg = argUse.GetNode();
+ if (varTypeIsGC(arg->TypeGet()))
+ {
+ assert(!"*** invalid IL: gc type passed to unmanaged call");
+ }
+
+ call->gtFlags |= arg->gtFlags & GTF_GLOB_EFFECT; |
@jkoritzinsky @AaronRobinsonMSFT The interop stub generators are generating invalid IL (passing byref values as pointer arguments) that is hitting subtle issues in the JIT. Can you please take a look at fixing the stub generators and adding the assert that Andy suggested? |
I see 93 cases when crossgenning SPC for x86 (changing the assert to a printf) -- some of these stubs have multiple byref args, so maybe 50 stubs in all?
|
@jkotas and @AndyAyersMS Will take over issue. |
Make the jit more robust in cases where the IL producer is passing a byref to an unmanaged caller, by retyping the argument as native int. Allows the jit to produce self-consistent GC info and avoid the issues seen in dotnet#34279, at least for byrefs. Closes dotnet#39040.
A couple of questions: (1) Does this not impact x64 for some reason? I don't understand why not if that is the case. I'm trying to determine if I'm hitting this or not, and I don't have a good way to audit all my generated IL at runtime in a system where this is happening. |
I would guess it is x86 specific. x86 uses a different gc encoding and may rely more heavily on the pinvoke signature to make sense of the stack. I can get you a version of the jit that will (optionally) fail with "invalid program" errors instead of asserts or silent casts, if you want to look for this sort of IL in your apps by running tests. |
@AndyAyersMS I see you've made a change that ignores this assert. I suppose if I use that as a baseline build for my testing I can assume that it is not the problem. And you've also said it doesn't seem likely on x64. |
Yes, more or less, at least for byrefs, motivated by IL from C++/CLI that passes byrefs to unmanaged callers. We're hoping that the IL producer has somehow guaranteed that the byref won't change values across a GC, and want to make sure we don't cause the program to fail for other reasons. We could just reject the IL, but since we've tolerated this for a long time it would likely cause more trouble than it's worth. But you might want to do that if you are able/willing to fix up your IL producers.
I might be wrong.... @jkotas ? |
I agree with your statement: "It doesn't seem likely on x64". We have not seen an actual problem caused by it on x64. |
Hit in
Common.Tests
on Windows x86 Checked in #34261:The text was updated successfully, but these errors were encountered: