Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AddOpenTelemetrySelfDiagnosticsLogging to OpenTelemetry.Extensions.Hosting #1867

Closed
wants to merge 14 commits into from

Conversation

ejsmith
Copy link
Contributor

@ejsmith ejsmith commented Mar 4, 2021

It's currently hard to see what is going on with the OpenTelemetry system when you are trying to troubleshoot issues. OpenTelemetry has a self diagnostics feature, but it writes the events to a log file. This PR makes it easy to output the self diagnostics events to the .NET logging system instead and be able to easily see what is going on right in your applications logs.

@ejsmith ejsmith requested a review from a team March 4, 2021 05:34
@codecov
Copy link

codecov bot commented Mar 4, 2021

Codecov Report

Merging #1867 (59774f8) into main (0ecb317) will decrease coverage by 0.14%.
The diff coverage is 73.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1867      +/-   ##
==========================================
- Coverage   83.75%   83.61%   -0.15%     
==========================================
  Files         251      255       +4     
  Lines        8864     9038     +174     
==========================================
+ Hits         7424     7557     +133     
- Misses       1440     1481      +41     
Impacted Files Coverage Δ
...osting/Implementation/EventSourceEventFormatter.cs 32.35% <32.35%> (ø)
src/OpenTelemetry/Internal/StringBuilderPool.cs 79.16% <79.16%> (ø)
...Implementation/SelfDiagnosticsEventLogForwarder.cs 82.69% <82.69%> (ø)
...lementation/SelfDiagnosticsLoggingHostedService.cs 88.88% <88.88%> (ø)
...ensions.Hosting/OpenTelemetryServicesExtensions.cs 69.23% <100.00%> (ø)
...s.Hosting/Trace/TracerProviderBuilderExtensions.cs 76.92% <100.00%> (+3.01%) ⬆️
...nTelemetry/Internal/OpenTelemetrySdkEventSource.cs 74.52% <0.00%> (+2.83%) ⬆️
src/OpenTelemetry/BaseProcessor.cs 72.00% <0.00%> (+12.00%) ⬆️

@CodeBlanch
Copy link
Member

Funny I did the same thing. Here's mine: https://github.com/Macross-Software/core/tree/develop/ClassLibraries/Macross.OpenTelemetry.Extensions

@ejsmith
Copy link
Contributor Author

ejsmith commented Mar 5, 2021

Nice! I think it's really important to be able to easily enable these to go to the normal logging system and see why something isn't working or even just how it's working.

/// SelfDiagnosticsEventListener class enables the events from OpenTelemetry event sources
/// and write the events to an ILoggerFactory.
/// </summary>
internal class SelfDiagnosticsEventLogForwarder : EventListener
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended for local debugging only, or for production use?
I'll be a bit concerned if it is for production:

  1. The pattern matching "OpenTelemetry-*" and putting things in the loggers dictionary could be an attack surface, causing unbounded memory usage.
  2. If folks consumes the logs in OpenTelemetry processor/exporters, it might dead loop (e.g. self-diagnostics go to logs, logs go to processor, processor triggered self-diagnostics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think it would just be for troubleshooting setups. Don't really think you'd ever want to have it on in production. Maybe briefly to figure out a telementry issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1258 has captured some of the requirements for a production-ready self-diagnostics solution.

If this feature is intended for non-production troubleshooting, is there a way we can clearly indicate that so people won't misuse it (e.g. a separate package, similar like the ConsoleExporter)? (e.g. in C++ folks normally do debug/release bits, doesn't seem to be a common practice in C# although technically it is supported)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it's just listening to the event sources and writing to the logging system. Both of those things happen in production. I guess you might leave it on with a high minimum level like you do normal logging. So you can see errors and such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @reyang we wouldn't need any of this if the whole project would just make use of MS.Ext.Logging like I was trying to argue for in that issue. That hasn't happened and the system currently uses event sources. So this is just forwarding those events to the standard logging system that every modern .NET app uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern here is that we could easily end up with dead loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijothomas that would require moving away from event sources being the primary thing that OTel is writing to internally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user has enabled OpenTelemetryLoggerProvider (eg: https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/docs/logs/getting-started/Program.cs#L33), then this would be a never-ending-loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejsmith Could you check this please?
(Sorry I haven't yet reviewed the recent change, but this part was an open issue even before, and not sure if this was addressed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijothomas I just tool a look at this and the Examples.AspNetCore project already uses the OTel logging provider and it works fine. It does not result in a never-ending-loop.

@ejsmith ejsmith force-pushed the AddSelfDiagnosticsLogging branch from 0888d47 to 5089419 Compare March 9, 2021 23:50
@ejsmith ejsmith marked this pull request as draft March 10, 2021 16:35
@ejsmith
Copy link
Contributor Author

ejsmith commented Mar 10, 2021

Changed to draft to wait for #1889

@ejsmith ejsmith force-pushed the AddSelfDiagnosticsLogging branch from 9841577 to df3f1e2 Compare March 18, 2021 16:09
@ejsmith ejsmith marked this pull request as ready for review March 18, 2021 16:10
@ejsmith
Copy link
Contributor Author

ejsmith commented Mar 18, 2021

@CodeBlanch @reyang @xiang17 This is up to date with main now and I think it's ready to be considered for merging. It needs to live in it's own hosted service or it won't be able to log startup issues.

@ejsmith ejsmith force-pushed the AddSelfDiagnosticsLogging branch from 634c911 to 701d8f4 Compare March 18, 2021 16:35
@ejsmith
Copy link
Contributor Author

ejsmith commented Mar 23, 2021

@CodeBlanch @cijothomas @reyang I added tests and I think I've addressed all the feedback. Can you please take a look at this and let me know what needs to be done in order to get this merged?

// The sole purpose of this HostedService is to
// start forwarding the self-diagnostics events
// to the logger factory
this.forwarder = new SelfDiagnosticsEventLogForwarder(System.Diagnostics.Tracing.EventLevel.LogAlways, this.serviceProvider.GetService<ILoggerFactory>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogAlways here means all the logs are going to collected by the forwarder.... Won't it mean checks like this (https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/Internal/OpenTelemetrySdkEventSource.cs#L95), would always be true and causes perf issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The messages get filtered out in the logger and can be controlled by logging configuration. The same logging system used in all of .NET Core production apps. I could make this a parameter going into the addselfdiagnostics method, but that would mean you would need to manually set that value and changing it in your logging configuration might not have any affect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification calls out having a dedicated option for controlling log level, so I think being able to adjust that independently of an application's logging level makes sense. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md#general-sdk-configuration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the ILogger logs are configured for ,say, only Error level, the current code subscribes to all level EventsSource logs (and drop the ones below Error.). This'd affect perf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijothomas I just pushed changes that set the EventLevels based on the logging configuration. It will also update them if the logging configuration changes at run time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will re-review this part.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is trying to solve a reasonable ask (forward logs to Asp.Net Core's Ilogger mechanism), there are open questions unresolved, hence requesting-changes until we resolve them.

@ejsmith
Copy link
Contributor Author

ejsmith commented Mar 25, 2021

To me this isn't a reasonable ask. It's mandatory and what I would expect. It is currently very hard to figure out what is going on when I've misconfigured something and my APM data is just disappearing and not making it to the destination. Also, when I'm first starting out it's very valuable to see how things work.

@ejsmith
Copy link
Contributor Author

ejsmith commented Mar 25, 2021

In production if I have the log level set to error then I can see when bad things are happening inside of Otel instead of them just being eaten and I never know about them.

@ejsmith ejsmith force-pushed the AddSelfDiagnosticsLogging branch from e379e0d to 196f63c Compare April 9, 2021 04:01
@ejsmith ejsmith force-pushed the AddSelfDiagnosticsLogging branch from 25d8638 to 30b4390 Compare May 15, 2021 02:20
@ejsmith
Copy link
Contributor Author

ejsmith commented Jul 25, 2021

@cijothomas @CodeBlanch can I please get an update on the status of this PR?

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 28, 2022
@ejsmith
Copy link
Contributor Author

ejsmith commented Jan 28, 2022

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Sigh.

@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 29, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2022

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Feb 5, 2022
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Issues and pull requests which have been flagged for closing due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants