Troubleshooting Intermittent Telemetry Logging in .NET 6 with OpenTelemetry: An In-depth Analysis and Temporary Solution #4926
Replies: 2 comments
-
Clear Indication Towards an IIS Context IssueThank you all for your diligent observations and comments thus far. After traversing through the data and instances provided, we've descended into a scenario that requires our utmost attention and collaborative resolution. I’ve dissected the situation and here is a clear, deductive path that leans towards the conclusions reached in point 3. 1. Persistent Memory Instances:After manually invoking the Garbage Collector during the memory dump, both instances - the metrics-producing application and the collector - remain discernible in the memory. This consistency between active instances and memory representation points towards an undeviating relation of instances with our issue at hand, rather than an irregularity in memory allocation or deallocation. 2. Network Error Elimination:The co-existence of the metrics producer and collector on the identical machine effectively nullifies the hypothesis of network errors contributing to the traced issue. This brings to the forefront a stark revelation that the surface-level error we are perceiving is perhaps a symptom of an underlying, concealed issue. 3. The IIS Context Culprit:When immersing into a .NET 6 console application context, the metrics are dispatched seamlessly, devoid of discrepancies. Conversely, introducing an application into the IIS context, by launching it as a process (in-process), resurrects the aforestated issue. This correlation between the manifestation of the problem and the IIS context is undeniable. ConclusionConsidering these aspects, it's evident that our issue is intricately tied to the IIS context in which the application is running. It's imperative that we dive deep into this particular area, exploring potential incongruities or conflicts that might be surfacing only within this specific context. Your expertise and insights into how applications behave diversely when executed within an IIS context, or any anomalies witnessed previously, would be immensely valuable. Urgency binds us due to the criticality of the situation, and your swift response will be greatly appreciated. Let's unravel this together and sculpt a resolution at the earliest. Thank you in advance for your cooperation and looking forward to fruitful discussions ahead. |
Beta Was this translation helpful? Give feedback.
-
Open a bug for developers. Thanks to all for the attention. |
Beta Was this translation helpful? Give feedback.
-
Bug Report
Symptom
Describe the bug
After deployment, telemetry logging in our .NET 6 projects using OpenTelemetry only functions for a limited time span. The logging ceases without error warnings, and a memory dump reveals that instances of OpenTelemetry vanish until a connection pool refresh.
Expected behavior
Continuous and uninterrupted telemetry logging with OpenTelemetry instances remaining persistent without requiring a pool refresh.
Runtime environment:
Additional context
Potential issue with the Garbage Collector prematurely deallocating the MeterProvider instance, interrupting the telemetry logging. Deployed applications utilizing .NET 6 with slightly varied OpenTelemetry initiation codes are being impacted.
Notably, within our suite of applications, the ones experiencing this issue are the only two hosted on IIS, presenting a possible correlation that may warrant further exploration.
Reproduce
Steps to reproduce the behavior:
Additional Steps Taken:
Enabled logging with the command:
Yet, no logs are recorded in %ProgramData%\OpenTelemetry .NET AutoInstrumentation\logs.
Implemented a static field, Provider, to retain the MeterProvider instance and utilized GC.KeepAlive to prevent early collection by the GC:
I added GC KeepAlive
that replaces the previous logic:
This temporary solution seems to initially work, but further validation in the production environment is necessary and better or alternative strategies are sought to avoid potential memory issues in the long-running applications.
Any insights, alternatives, or optimizations that might offer a more stable solution to this issue would be highly appreciated. Also, any additional steps or locations to explore for troubleshooting, considering the missing logs, would be beneficial.
Configuration class code
Here I provide you the code of the configuration class before applying changes described above :
The class is referenced in *Program.cs* written using Minimal Hosting / Minimal Apis approach :
Log Results:
Enabling logs the following error comes out, but I'm pretty sure that we haven't network issue :
Beta Was this translation helpful? Give feedback.
All reactions