-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding FunctionActivitySource to Worker; refactoring AppInsights support #1307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor(-ish) comments, looks great overall. Super-excited to see it coming! 🚀
src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great to see!
A question that I have with this and with DurableTask tracing: I would like to ship the activity source as "preview" to start. This is so we can iterate on schema. But how? How do we ship the activity source within this already GA package? Do we need to split this out into its own package Microsoft.Azure.Functions.Worker.Instrumentation
and have some internal hook point for that separate package to start/stop its activity at the right point?
|
||
public Activity? StartInvoke(FunctionContext context) | ||
{ | ||
var activity = _activitySource.StartActivity(TraceConstants.FunctionsInvokeActivityName, ActivityKind.Internal, context.TraceContext.TraceParent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be ActivityKind.Server
as we are ultimately serving a function invocation request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment here -- #1307 (comment). It seems that this is currently not really defined by the otel spec, but that App Insights and others treat this as an internal call with the host log being the "Server" piece. If they were both "Server", rendering of the Span would not be correct.
I think this is something we can handle with the OTel schema versioning we're doing -- if things change in the future, we can use that to change the Kind as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to have a discussion about the spans we will all have. I don't think we should be trying to hide the gRPC hop from Host to Worker. Our Host/Worker design is a distributed system and it should be represented as such in spans. Customers should see the following at minimum:
Incoming request (Host Server Span) --> Host Calling Worker (Host Client Span) --> Worker handling invocation (Worker Server Span)
And then we also have to see how gRPC spans factor into all of this. Given that N gRPC spans may relate to a single invocation, I think they should be links on the Functions spans and not parent/child
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So -- the spec does say it should be Server: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/faas/#incoming-invocations. But I also see really no notion of a "host". It seems to only care about the instance handling the invocation itself.
I think in most other clouds, the "host" -- the thing polling/sending you events -- is part of infrastructure and not even present in the distributed trace? anyone have any idea?
@lmolkova -- if we do indeed make this a Server (which is what it seems like it should be) -- that would turn this span into a Request in App Insights, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Server
becomes request
in application insights. This is also beneficial because app insights has built in aggregation for request telemetry - failure rate, performance, etc. I see this as a great benefit for customers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're saying that Host => Worker is an Activity
, that can be listened to by users (and therefore output to their backend observability platform), then we add more spans in the Worker under the same source (like a Http instrumentation span) then listening to the ActivitySource
will emit to spans, one of which is arguably not that useful to them? and if they don't control the performance of it, I see little reason they should see it, and pay for it (in storage/backend costs).
Having a separate ActivitySource
with .Host
on the end might be a better way to look at it, so it become an opt-in for Host metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. Any other ActivitySource will be differentiated. That's why this one is named with ".Worker" -- it's only meant to emit Activities directly related to the Worker.
Any Activity related to the host will be appropriately named and, won't exist in the worker process -- those will be emitted via the Host process.
Does that address the concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think option 1 is the most "truthful" and flexible option. As @brettsam said, the Host spans will be identifiable as a different source. In the App Insights scenario, we should also ensure the emitted telemetry items are appropriately distinguishable from equivalent spans from the Worker. So that in the performance or failure view pages the Host and Worker spans are aggregated separately (I think this means ensuring they are different app names).
My justification is I believe that we should give Server
spans for the Worker (and specifically not Internal
). This is for similar reasons as to what @brettsam mentioned: we want to keep it flexible so that if Host and Worker separate further in the future, this telemetry remains accurate without any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My justification is I believe that we should give Server spans for the Worker (and specifically not Internal)
Having two nested server spans will break the application map in application insights and could be confusing to other backends. It also goes against otel spec that says server spans should have client parents - this creates expectations for backends that would be broken. So if the decision is to use server span, it'd be best to add client span that tracks call to worker right away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, option 1 does include the client span from Host -> Worker.
src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs
Outdated
Show resolved
Hide resolved
@jviau re: versioning I was planning to get this in as GA inside the Worker assembly without any preview. (The ActivitySource; not the App Insights stuff). I was intentionally keeping this initial Activity small -- with really only a single tag added plus the schema. I figured the only real changes I see coming are:
We're getting lots of requests for this to be "official" -- so I didn't want to sit in Preview for a long time. I know changes will come which is why I chatted with @lmolkova about how everyone else is handling schema changes. We can certainly move to an external diagnostics assembly -- it'd be more flexible. But I'm sure even after we GA that we'd get more changes coming in the future. Maybe it'd be easier to bump that one's major we have to do a major overhaul? Curious how others feel -- @fabiocav / @lmolkova / @RohitRanjanMS |
Fixes open-telemetry/opentelemetry-specification#1359
Fixes open-telemetry/opentelemetry-specification#1358
Fixes open-telemetry/opentelemetry-specification#1264
This change:
FunctionActivitySource
into the core worker project and provides OTel versioning capabillities for the future."WorkerApplicationInsightsLoggingEnabled"
capability for when functions host can handle this (soon)Note that after this change you can start wiring up OpenTelemetry exporters in the worker as well and bypass our ApplicationInsights package completely. For example: