-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release/7.0] CoreCLR initialization failures logging #81134
[Release/7.0] CoreCLR initialization failures logging #81134
Conversation
Port of dotnet#78484, dotnet#78790, dotnet#80104 and dotnet#80294 This change adds detecting and logging of failures during coreclr initialization. For logging, it uses a new host API `coreclr_set_error_writer` to register a callback to report the errors to the host. The hosts have support for optional usage of this API so that they can work with older runtime versions as well. The logging checks and reports failures with: * System.Private.CoreLib.dll * GC initialization * JIT initialization * libSystem.Native.so/dylib on Unix The logging messages should allow customers to self-diagnose the issues causing the failures. This change also adds backport of support for standalone GC back compatibility that is a prerequisite for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved. we will take for consideration in 7.0.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately I didn't know when this was checked into 8.0 and I do want to change some things there but they won't change anything functionally. so I'll just make the changes in 8.0 and those don't need to be backported.
actually we are still discussing this on our side. please don't merge yet. |
* Fix GC interfaces versioning The change that introduced GC interfaces versioning had a bug preventing it from using .NET 8 GC being used with .NET 7 runtime. The GC_VersionInfo return value should have been the minimum supported version, not the current one. But more importantly there was also some confusion on what interface the GC_INTERFACE_MAJOR_VERSION represents. While the change considered it to be a version of the IGCToCLR interface, it really means the version of the IGCHeap interface. This change creates a separate version, EE_INTERFACE_MAJOR_VERSION for versioning o the IGCToCLR interface to rectify that. @Maoni0 also didn't like the way of creating a new version of IGCToCLR interface for each major version change, so I am changing it to single IGCToCLR interface.
@Maoni0 I have added the change based on your feedback. |
@dotnet/dnceng some of the CI legs in this PR are consistently failing with
Is it a known issue? |
I think so. It's hard in a pipeline with over 100 jobs where you haven't specified which of the jobs you're talking about to deduce what you specifically mean, but if you mean logs like https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-81134-merge-894df07df5fc42d8a6/Microsoft.Extensions.Configuration.UserSecrets.Tests/1/console.446ceeaf.log , this is because you're using Regardless of whether I guessed correctly (again, please include the name of a AzDO job or Helix correlation id when asking these questions), you should move to running on |
Ah. This is specifically what I was trying to point out earlier. Change your The basic reason this started happening recently because we only recently moved the ubuntu.1804.armarch* queues to Azure VMs, which necessitated a parallel change in docker images from last year. There was also a dncpartners email from approximately 9/26/2022 requesting you to make this move you can refer to. |
Approved by Tactics for 7.0.4. |
Port of #78484, #78790, #80104 and #80294
This change adds detecting and logging of failures during coreclr initialization. For logging, it uses a new host API
coreclr_set_error_writer
to register a callback to report the errors to the host. The hosts have support for optional usage of this API so that they can work with older runtime versions as well.The logging checks and reports failures with:
The logging messages should allow customers to self-diagnose the issues causing the failures.
This change also adds backport of support for standalone GC back compatibility that is a prerequisite for it.
Customer Impact
Without this change, .NET core apps can fail with a dreaded error code 0x80004005 (
E_FAIL
) in many cases like issues with System.Private.CoreLib.dll loading, GC initialization and other cases. That effectively prevents self-diagnosis of the underlying issue. Both 3rd party customers and internal teams including the Excel team have been asking for more detailed diagnostic information for this kind of failures for quite a long time.Testing
Local testing with manual injections of various failures like broken / incorrect version / missing System.Private.CoreLib.dll, JIT, libSystem.Native.so/dylib on Unix or wrong GC configuration. The Excel team also tested this port with their .NET host.
Risk
Low, the change influences only the failure code paths during coreclr initialization when the app is already on its way to be aborted with the
E_FAIL
. It has been in main for 1.5 months now without any issues other than one issue with single file exe that was discovered and fixed quickly.