-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Application exception exit: segfault #32848
Comments
Hi @fadimounir, do you think this is a bug or something else? |
It looks like a bug from the callstack. I didn't get a chance to take a look yet, but does this reproduce consistently or non-deterministically? Also, what are the repro steps? |
It reproduces non-deterministically. When our program runs for hours or days, it happens by accident. And Do you need any other information? |
Can you share your program so I can run it on my end, and capture the failure under debugger? Without a repro or a crash dump, I won't be able to investigate the root cause |
It takes some configuration to run our program, and it's hard to reproduce the problem right away. Does it help if I share the crash dump directly to you? https://drive.google.com/open?id=1BaVppnjLNba8MP9hHJmpLZOLjr6rXK2Y |
Yes, thanks for sharing the crash dump. I was able to download it, get the symbols, and I'll start taking a look to understand what happened |
@zhxymh Could you please also share all the dlls that were used by the app when this dump was produced? |
Sorry, I lost that version of dlls. If you can't debug without dlls, I can only share with you the next time this problem comes up. Thank you very much for your help! |
If you have any other version of the dlls, it could also help with the investigation at this time, until you are able to capture another crash dump the next time the problem comes up. |
Try this version of dlls, thx! |
When you captured the dump, was the application compiled into ReadyToRun images? (i.e. did you set PublishReadyToRun=true in your csproj, or in the dotnet publish command line?) The reason is that from the dump, there was some evidence that your app assemblies were compiled into R2R, and from the stack traces and thread states, am suspecting the underlying root cause might be this issue or related to it: #608 The app assemblies you provided all seem to be IL images. Maybe for example you packaged the build folder and uploaded it instead of the publish folder, which would have the R2R images? (ReadyToRun is a publish-only scenario). If you are able to confirm that you used ReadyToRun images while running your scenario, and provide me with those images, I should be able to quickly check if a certain bit was missing in the images, which can confirm that the issue really is #608 |
We did not set PublishReadyToRun on the project or the command line. |
Ok thank you for confirming. I don't think using assembly load in the code would cause this. The issue here is that it seems that a particular type needed to be eagerly loaded by the TypeLoader before a certain method executes, but was not and was deferred to a later stage. Then 2 threads attempted to the load that type at the same time, where one of them is a GC collection thread that is not supposed to be loading any type at all. Right now i'm investigating why the type was not correctly eagerly loaded. |
Ok, thank you very much! |
Hi @fadimounir , The crash problem has come up again, we are now using .net core 3.1.102, and the stack information for the crash is not quite the same as last time. I share the dump file and dlls of the crash with you. Dump file and dlls: Exception:
Environment:
|
@janvorli This callstack on the second crash looks like something you've been looking at recently. There is a dump for it here if you need one. |
@fadimounir the call stack is different from what I was seeing when looking into the issue #32171. But it doesn't mean it cannot be caused by the same issue. Failures caused by that could hit at various places. I'll look at the dump to see if there is any sign of correlation with that fixed issue. |
@zhxymh I am unable to confirm whether this is due to the same issue as #32171 and so it was already fixed or not from the dump. The dump is missing some information. Could you please try to hit the issue again, but running the following from the same shell before running your app?
This makes the core dump "beefier", containing more data. I could also share a modified System.Private.CoreLib.dll with you that has a little hack fixing the issue from #32171 so that we can confirm whether the issue is caused by the same thing or not. |
@zhxymh I have identified the root cause behind the first crash dump you provided, and have a fix for it in progress. The fix will be in dotnet 5. I still don't know yet if it will be ported to an update version of 3.XYZ |
Thx, we will set it and test again. |
Thx! And if we can't get the fix version soon. When does this happen? Can we optimize our own code to avoid this? |
This happens when a GC is triggered at the same time a certain virtual generic method is being invoked (in your scenario:
That might be tricky to do, given that the crash is triggered by library code you are using in your app most likely.
var dummy = typeof(Grpc.Core.DefaultCallInvoker.CallInvocationDetails<BlockWithTransactions, PeerDialException> The second suggestion will help load this type eagerly, but things might still fail on another type unfortunately if another race condition between execution and GC happens at a very unlucky time. |
Thank you very much for your help! |
Fixed via #33733 |
We run the application in docker, and occasionally the application crashes. I found an early issue #10856 similar to this one. But it has been fixed.
I don't know why it happens?
Here are the details:
Environment:
Exception information:
coredump:
The text was updated successfully, but these errors were encountered: