adding FunctionActivitySource to Worker; refactoring AppInsights support #1307

brettsam · 2023-02-01T22:45:33Z

Fixes open-telemetry/opentelemetry-specification#1359
Fixes open-telemetry/opentelemetry-specification#1358
Fixes open-telemetry/opentelemetry-specification#1264

This change:

moves the FunctionActivitySource into the core worker project and provides OTel versioning capabillities for the future.
updates the handling of this Activity in the App Insights package
makes the App Insights handling richer (now sends Exceptions, sets Dependency status, for example)
removes a bunch of other custom App Insights code we had to simplify some of our behavior
sets "WorkerApplicationInsightsLoggingEnabled" capability for when functions host can handle this (soon)

Note that after this change you can start wiring up OpenTelemetry exporters in the worker as well and bypass our ApplicationInsights package completely. For example:

funcBuilder.Services.AddOpenTelemetry()
    .WithTracing(traceBuilder =>
    {
        traceBuilder.AddSource("Microsoft.Azure.Functions.Worker")
            .AddHttpClientInstrumentation()
            .AddAzureMonitorTraceExporter(o => o.ConnectionString = aiConnStr)
            .SetSampler<AlwaysOnSampler>();
    })
    .StartWithHost();

funcBuilder.Services.AddLogging(loggingBuilder =>
{
    loggingBuilder.AddOpenTelemetry(options =>
    {
        options.AddAzureMonitorLogExporter(o => o.ConnectionString = aiConnStr);
    });
});

lmolkova

Left a few minor(-ish) comments, looks great overall. Super-excited to see it coming! 🚀

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs

src/DotNetWorker.Core/Diagnostics/FunctionActivitySource.cs

src/DotNetWorker.Core/Diagnostics/TraceConstants.cs

src/DotNetWorker.Core/FunctionsApplication.cs

src/DotNetWorker.Core/Diagnostics/FunctionActivitySource.cs

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs

jviau

This is great to see!

A question that I have with this and with DurableTask tracing: I would like to ship the activity source as "preview" to start. This is so we can iterate on schema. But how? How do we ship the activity source within this already GA package? Do we need to split this out into its own package Microsoft.Azure.Functions.Worker.Instrumentation and have some internal hook point for that separate package to start/stop its activity at the right point?

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs

src/DotNetWorker.Core/Diagnostics/FunctionActivitySource.cs

jviau · 2023-02-03T18:33:13Z

src/DotNetWorker.Core/Diagnostics/FunctionActivitySource.cs

+
+        public Activity? StartInvoke(FunctionContext context)
+        {
+            var activity = _activitySource.StartActivity(TraceConstants.FunctionsInvokeActivityName, ActivityKind.Internal, context.TraceContext.TraceParent,


Should this be ActivityKind.Server as we are ultimately serving a function invocation request?

See the comment here -- #1307 (comment). It seems that this is currently not really defined by the otel spec, but that App Insights and others treat this as an internal call with the host log being the "Server" piece. If they were both "Server", rendering of the Span would not be correct.

I think this is something we can handle with the OTel schema versioning we're doing -- if things change in the future, we can use that to change the Kind as well.

I think we need to have a discussion about the spans we will all have. I don't think we should be trying to hide the gRPC hop from Host to Worker. Our Host/Worker design is a distributed system and it should be represented as such in spans. Customers should see the following at minimum:

Incoming request (Host Server Span) --> Host Calling Worker (Host Client Span) --> Worker handling invocation (Worker Server Span)

And then we also have to see how gRPC spans factor into all of this. Given that N gRPC spans may relate to a single invocation, I think they should be links on the Functions spans and not parent/child

So -- the spec does say it should be Server: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/faas/#incoming-invocations. But I also see really no notion of a "host". It seems to only care about the instance handling the invocation itself.

I think in most other clouds, the "host" -- the thing polling/sending you events -- is part of infrastructure and not even present in the distributed trace? anyone have any idea?

@lmolkova -- if we do indeed make this a Server (which is what it seems like it should be) -- that would turn this span into a Request in App Insights, right?

Yes, Server becomes request in application insights. This is also beneficial because app insights has built in aggregation for request telemetry - failure rate, performance, etc. I see this as a great benefit for customers.

if we're saying that Host => Worker is an Activity, that can be listened to by users (and therefore output to their backend observability platform), then we add more spans in the Worker under the same source (like a Http instrumentation span) then listening to the ActivitySource will emit to spans, one of which is arguably not that useful to them? and if they don't control the performance of it, I see little reason they should see it, and pay for it (in storage/backend costs).

Having a separate ActivitySource with .Host on the end might be a better way to look at it, so it become an opt-in for Host metrics?

Ah, yes. Any other ActivitySource will be differentiated. That's why this one is named with ".Worker" -- it's only meant to emit Activities directly related to the Worker.

Any Activity related to the host will be appropriately named and, won't exist in the worker process -- those will be emitted via the Host process.

Does that address the concern?

I think option 1 is the most "truthful" and flexible option. As @brettsam said, the Host spans will be identifiable as a different source. In the App Insights scenario, we should also ensure the emitted telemetry items are appropriately distinguishable from equivalent spans from the Worker. So that in the performance or failure view pages the Host and Worker spans are aggregated separately (I think this means ensuring they are different app names).

My justification is I believe that we should give Server spans for the Worker (and specifically not Internal). This is for similar reasons as to what @brettsam mentioned: we want to keep it flexible so that if Host and Worker separate further in the future, this telemetry remains accurate without any changes.

My justification is I believe that we should give Server spans for the Worker (and specifically not Internal)

Having two nested server spans will break the application map in application insights and could be confusing to other backends. It also goes against otel spec that says server spans should have client parents - this creates expectations for backends that would be broken. So if the decision is to use server span, it'd be best to add client span that tracks call to worker right away

Yes, option 1 does include the client span from Host -> Worker.

src/DotNetWorker.Core/Diagnostics/FunctionActivitySource.cs

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs

brettsam · 2023-02-03T22:54:06Z

@jviau re: versioning

I was planning to get this in as GA inside the Worker assembly without any preview. (The ActivitySource; not the App Insights stuff). I was intentionally keeping this initial Activity small -- with really only a single tag added plus the schema. I figured the only real changes I see coming are:

add new tags to make things richer -- which I wouldn't see as "breaking"
some tag needs to change due to the otel schema changing. This is factored in (i.e. "faas.execution" -> "faas.invocation" mapping would already happen and customer would choose the schema they want to run with in WorkerOptions).
we'd need to change the Activity name or kind? We can hide this behind the schema selection as well.

We're getting lots of requests for this to be "official" -- so I didn't want to sit in Preview for a long time. I know changes will come which is why I chatted with @lmolkova about how everyone else is handling schema changes.

We can certainly move to an external diagnostics assembly -- it'd be more flexible. But I'm sure even after we GA that we'd get more changes coming in the future. Maybe it'd be easier to bump that one's major we have to do a major overhaul?

Curious how others feel -- @fabiocav / @lmolkova / @RohitRanjanMS

src/DotNetWorker.Core/FunctionsApplication.cs

test/DotNetWorkerTests/ApplicationInsights/EndToEndTests.cs

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs

src/DotNetWorker.ApplicationInsights/FunctionsApplicationInsightsExtensions.cs

test/DotNetWorkerTests/TestFunctionInvocation.cs

src/DotNetWorker.Core/FunctionsApplication.cs

src/DotNetWorker.ApplicationInsights/FunctionsApplicationInsightsExtensions.cs

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryInitializer.cs

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs

adding FunctionActivitySource to Worker; refactoring AppInsights support

8189640

brettsam requested review from jviau, liliankasem and fabiocav and removed request for liliankasem February 1, 2023 22:51

lmolkova approved these changes Feb 2, 2023

View reviewed changes

RohitRanjanMS reviewed Feb 2, 2023

View reviewed changes

src/DotNetWorker.Core/FunctionsApplication.cs Outdated Show resolved Hide resolved

src/DotNetWorker.Core/Diagnostics/FunctionActivitySource.cs Outdated Show resolved Hide resolved

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs Show resolved Hide resolved

lmolkova reviewed Feb 2, 2023

View reviewed changes

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs Show resolved Hide resolved

first round of PR comments

6be2f1e

RohitRanjanMS reviewed Feb 3, 2023

View reviewed changes

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs Outdated Show resolved Hide resolved

next round of PR comments

0b5e192

jviau reviewed Feb 3, 2023

View reviewed changes

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs Show resolved Hide resolved

RohitRanjanMS reviewed Feb 3, 2023

View reviewed changes

src/DotNetWorker.ApplicationInsights/FunctionsTelemetryModule.cs Outdated Show resolved Hide resolved

kshyju reviewed Feb 9, 2023

View reviewed changes

jviau reviewed Feb 9, 2023

View reviewed changes

test/DotNetWorkerTests/TestFunctionInvocation.cs Outdated Show resolved Hide resolved

src/DotNetWorker.Core/FunctionsApplication.cs Show resolved Hide resolved

lmolkova mentioned this pull request Jul 9, 2024

FaaS: provide span structure recommendations for multiple layers of instrumentation open-telemetry/semantic-conventions#1224

Open

brettsam added 2 commits February 16, 2023 14:34

another round of PR comments

e33b72e

changing ActivityKind from Internal to Server

64114b8

brettsam requested review from jviau, RohitRanjanMS and kshyju March 14, 2023 15:48

jviau reviewed Mar 14, 2023

View reviewed changes

brettsam mentioned this pull request Mar 14, 2023

How to get AppInsights working in .NET 8 Functions v4 #1182

Open

more pr comments

bd423d4

jviau approved these changes Mar 14, 2023

View reviewed changes

brettsam merged commit b963a28 into main Mar 15, 2023

brettsam deleted the brettsam/activity branch March 15, 2023 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding FunctionActivitySource to Worker; refactoring AppInsights support #1307

adding FunctionActivitySource to Worker; refactoring AppInsights support #1307

brettsam commented Feb 1, 2023 •

edited

Loading

lmolkova left a comment

jviau left a comment

jviau Feb 3, 2023

brettsam Feb 3, 2023

jviau Feb 8, 2023

brettsam Feb 9, 2023

jviau Feb 9, 2023

martinjt Mar 2, 2023

brettsam Mar 9, 2023

jviau Mar 14, 2023

lmolkova Mar 14, 2023 •

edited

Loading

jviau Mar 14, 2023

brettsam commented Feb 3, 2023

adding FunctionActivitySource to Worker; refactoring AppInsights support #1307

adding FunctionActivitySource to Worker; refactoring AppInsights support #1307

Conversation

brettsam commented Feb 1, 2023 • edited Loading

lmolkova left a comment

Choose a reason for hiding this comment

jviau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettsam commented Feb 3, 2023

brettsam commented Feb 1, 2023 •

edited

Loading

lmolkova Mar 14, 2023 •

edited

Loading