-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed tracing and Orleans #4992
Comments
Great write-up! few comments:
I can help with any details and provide examples, just let me know. |
We are definitely interested in providing rich diagnostics to Orleans users, especially when they cross the boundaries to other systems. I think we should tentatively consider this for Orleans 3.0.0, and better understand the developer experience and performance impact of this. |
From reading various threads/docs on
Does that sound right to you, @NickDarvey & @lmolkova? This is something which we wont be able to address in the immediate future, but as Sergey said above, we should consider this for 3.0. |
I'd defer to @lmolkova, but that sounds sensible to me. You might have already looked in to this but I saw that there is a bridge from DiagnosticSource to EventSource which might mean you can do all the instrumentation with DiagnosticSource. |
@ReubenBond You may still use DiagnosticSource for verbose perf and diagnostics logging by using two different DiagnosticSources for distributed tracing and internal debugging, but EventSoiurce is more optimal for the latter. It also provides out-of-process monitoring with ETW and LTTng. DiagnosticSource to EventSource bridge may be hard to use as it requires some special configuration and even defines a language to serialize DS payloads in EventSource events. |
I'm still trying to wrap my head around
I noticed that by default,
This approach gets me the Perhaps I got it backwards? Thoughts? |
There're 2 ways for HttpClient header injecting I know: b) HttpClient provides a "middleware-like" way to configure Http request-response handling (much like Orleans' OutgoingGrainCallFilter) - you can add a custom handler to the client, and change request's header in that handler. public class TracingHandler: DelegatingHandler
{
public override Task<HttpResponseMessage> SendAsync(HttpRequestMessage message, CancellationToken token)
{
// change headers
base.SendAsync(message, token);
}
} a is the more universal option - it uses the .net core's diagnostic pattern, but it's sort of complex, you need to start a task/thread to subscribe and handle events, and it's not much configurable. |
@csyszf, thanks for the input. I haven't work with With my previous post, I was trying to access the same "trace identifier" that would be used "across different microservices". Given that What is (or should be) the correct way to "correlate" all "operations" under the same (parent) id? The (simple) goal is to have one (parent) operation id available across different services -- and since we're logging everything into Application Insights (integration with ASP.NET Core + Serilog AI sink for other logging requirements), that would be really nice. |
I'm not familiar with Application Insights, but |
Any news? |
Yes, we are actively working on this |
I'll cross-reference also dotnet/aspnetcore#5914 as those following this might be interested to see where the core framework is going and have potentially similar scenarios to the branched issues. <Edit: Links of interest by following from ASP.NET Core to https://w3c.github.io/trace-context/ and from there to https://github.com/open-telemetry/opentelemetry-dotnet/ (and dotnet/aspnetcore#8924). |
Is this issue going to make the 3.0.0 release? |
What's the progress of this issue? |
The ASP.NET Core team posted a very enlightening blog post showing how the new Activity API works: |
Also potentially interesting example from Firefox DevTools: Server-Timing HTTP header (also https://twitter.com/FirefoxDevTools/status/1201914691863244800). |
Maybe interesting resource: https://opentelemetry.io/. Also cross-referencing to #6178 that asks guidance for production runs. |
I am working for the .NET agent of Apache Skywalking APM and want to support orleans. Like #6213 said. |
@ElderJames I responded on that thread |
I have a branch where I was testing integration with Jaeger via OpenTracing and DiagnosticSource. Longer term, OpenTelemetry is probably the way to go but I'm not sure if they had any nuget packages at the time. I plan to complete the integration to have a better end-to-end, pay-for-what-you-use solution here. |
FYI, saw some related work in dotnet/runtime#31373
and its linked issue dotnet/runtime#31372
|
@ReubenBond Given this didn't make 3.0 -- is this on track for 4.0 ? |
I link https://petabridge.com/blog/state-opentelemetry-dotnet/ and also dotnet/runtime#31862 (in spirit of @NickDarvey) to provide context. |
+1 for interest in seeing how opentelemetry integration pans out. |
@ReubenBond, is it still actual to implement this feature? |
Goal
Distributed tracing that works out-of-the-box with other Microsoft libraries like System.Net (HttpClient), EntityFramework, Event Hubs SDK and Service Bus SDK.
Value
I want to instrument my Orleans application in a way I can correlate events so I can trace the effect of any one request across an entire system, across service boundaries. As an example, by adding a trace identifier to a HTTP request, I want to see that it reaches my gateway (in ASP.NET Core), each grain it hits (in Orleans), the messages it creates (on an Event Hub) and the database transaction it executes (via Entity Framework).
Background
I understand Orleans provides an ActivityId however it doesn't cover distributed tracing out-of-the-box (i.e. flowing that ActivityId to other systems).
I looked at how other Microsoft libraries did this (e.g. System.Net, Event Hubs SDK) and they seem to be converging on System.Diagnostics.Activity which is available as an AsyncLocal container (Activity.Current) for carrying diagnostic context. (More information about Activity is in the user guide.)
Unfortunately, it's not something that is used with Microsoft.Extension.Logging but rather introduces a (yet another) new diagnostic API: System.Diagnostics.DiagnosticSource. DiagnosticSource is distinct from other logging systems in that it's to be consumed in-process and sources normally provide rich payloads.
(More information about DiagnosticSource is in the user guide.)
Libraries using Activity which make or receive external calls often implement some variation of this correlation protocol. This means that the Activity is emitted on outbound requests and resumed on inbound requests.
It looks like there is work underway to standardize (the header names) based on the W3C spec for distributed tracing. However, Orleans should be able to be instrumented now without being affected by these changes. (Client-to-Orleans calls are via an Orleans-provided library so I don't think a change in header names would be a breaking change.)
(There (hopefully?) won't be any change to Activity itself, depending on this discussion.)Edit: See @lmolkova's comment for details on the impact of W3C distributed tracing.
Implementation
There is official guidance for Activity and DiagnosticSource but newer documentation is work-in-progress and available here.
Based on the experience of people who have consumed DiagnosticSources before, it looks like a implementation similar to Entity Framework, where the events are typed, would be preferable.
My insight only goes as deep as poking around over the last week. I've come across people who seem very involved in the .NET and distributed tracing space who might be able to chime in and enlighten us about the right way of approaching this, such as @SergeyKanzhelev, @lmolkova and @cwe1ss.
Minimal
A minimal implementation would allow the Activity to flow across service boundaries with a DiagnosticSource for grain requests. Log providers (ILoggerProvider) could optionally listen to this DiagnosticSource (or all DiagnosticSources) and enrich the logs with the Activity context.
@rikbosch has a gist with an implementation of this part here.
@SergeyKanzhelev does App Insight's (ASP.NET Core?) ILoggerProvider implementation already make use of the Activity.Current for correlation? I saw this, but does that mean it'll work for any DiagnosticSource?
Extended
An extended implementation might go so far as to replace ILogger use in Orleans with DiagnosticSources and provide adapters back to ILogger (and to ITelemetryProvider for performance counters). I think the value of going this far would be to offer richer diagnostics to people who want it (e.g. someone who wants to build a profiler) and a single diagnostics API for Orleans. (The minimal implementation would mean Orleans would offer both ILogger and DiagnosticSource which is ugh.)
The text was updated successfully, but these errors were encountered: