WIP: OTel metrics #2469

calebschoepp · 2024-04-24T23:23:26Z

This is a WIP implementation for supporting OTel metrics in Spin. This code is far from complete, but I'm at a point where I want some feedback if I'm going down the right path or not.

Design goals

Leverage OTel environment variables for configuring the feature.
Have spin_telemetry manage creating and storing the Instruments (the things that record metrics).

I want other packages to be able to do something like spin_telemetry::u64_counter_add("my-counter", 1, my_attributes). The two alternatives to this are unfavourable IMO.

One alternative would be to have every crate in Spin that wants to emit metrics take on the responsibility of creating and managing its own instruments (spin_telemetry would only handle initializing the metrics exporter). This would suck b/c we're adding more state to more crates and then they also all need to import the OTel crates.
Another alternative would be to create and store the instruments in spin_telemetry albeit in a more traditional way. Statically define all the metrics we want and store them in a struct. This is just like we do it in LHC. The problem with this is we couple spin_telemetry to every crate. Now any time you want to add a metric somewhere you have to both add it where you want to emit it and you have to modify spin_telemetry. This is obviously untenable for plugins that want to emit metrics.

Crimes against humanity

I'm doing a lot of potential evil here. Lazy statics with global mutexed maps and declarative macros to generate all the implementations of a function.

I'd like to be scolded or encouraged.

Design questions

How do we feel about my design approach here to decouple crates. Is the macro/global-map shenanigans worth it?
Do we need to support observable instruments @lann ? I don't really get what they are TBH.
I'm finding that all this OTel o11y code is darn near impossible to test. I'd love to be able to write tests that test all possible combinations of OTEL_* env vars, but I don't think we can do that easily b/c the underlying libraries we depend on are what is reading the env. I'm all ears for ways that I could refactor this whole o11y implementation to be more testable.

Signed-off-by: Caleb Schoepp <caleb.schoepp@fermyon.com>

lann · 2024-04-25T14:03:05Z

crates/telemetry/src/env.rs

-pub(crate) fn otel_enabled() -> bool {
-    const ENABLING_VARS: &[&str] = &[
+pub(crate) fn otel_tracing_enabled() -> bool {
+    otel_enabled(&[
        "OTEL_EXPORTER_OTLP_ENDPOINT",


Somewhat of a tangent: these are (all?) available as consts from opentelemetry-otlp, e.g. https://docs.rs/opentelemetry-otlp/0.15.0/opentelemetry_otlp/constant.OTEL_EXPORTER_OTLP_ENDPOINT.html

It can be nice to use constants even when they are strings that match the name exactly; it both prevents typos and acts as implicit documentation that this is a standard.

lann · 2024-04-25T14:22:20Z

This is another option: https://docs.rs/tracing-opentelemetry/latest/tracing_opentelemetry/struct.MetricsLayer.html

Note that this does a map lookup per metric update so there might be a caveat there for very high frequency metrics.

calebschoepp · 2024-04-25T16:52:52Z

This is another option: https://docs.rs/tracing-opentelemetry/latest/tracing_opentelemetry/struct.MetricsLayer.html

Note that this does a map lookup per metric update so there might be a caveat there for very high frequency metrics.

Oh cool, I wasn't aware of this library. This is effectively the exact same implementation approach I took — storing the metric instruments in a global RwLock<HashMap<>> — so it would have a similar performance profile as my work. I'll play around with this b/c it would nice to just have a library that implements all this rather than having to maintain my custom code to do all this.

Notably it doesn't support observable counters or gauges.

@lann could you please respond to my other questions?

lann · 2024-04-25T17:15:12Z

How do we feel about my design approach here to decouple crates. Is the macro/global-map shenanigans worth it?

I think it makes sense for metrics. At least, I don't have a better answer.

Do we need to support observable instruments @lann ? I don't really get what they are TBH.

I don't either. 🤷

I'm finding that all this OTel o11y code is darn near impossible to test.

I don't have much insight here. Yep, its hard to test. Honestly I mostly just don't, because the cost of a failure in observability doesn't generally seem very high compared to the cost of testing it. 🤷 again

itowlson · 2024-04-28T20:45:08Z

If #2475 supersedes this, should we close this @calebschoepp?

calebschoepp · 2024-04-29T15:55:21Z

If #2475 supersedes this, should we close this @calebschoepp?

calebschoepp added 2 commits April 22, 2024 16:30

Setup a docker compose file that creates an o11y stack for Spin to use

91417b9

Signed-off-by: Caleb Schoepp <caleb.schoepp@fermyon.com>

Taking a first crack at implementing metrics

823ff6d

Signed-off-by: Caleb Schoepp <caleb.schoepp@fermyon.com>

calebschoepp requested review from lann, itowlson and rylev April 24, 2024 23:23

lann reviewed Apr 25, 2024

View reviewed changes

calebschoepp mentioned this pull request Apr 25, 2024

Taking a first crack at implementing metrics #2475

Merged

calebschoepp closed this Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: OTel metrics #2469

WIP: OTel metrics #2469

calebschoepp commented Apr 24, 2024

lann Apr 25, 2024

lann commented Apr 25, 2024 •

edited

Loading

calebschoepp commented Apr 25, 2024

lann commented Apr 25, 2024

itowlson commented Apr 28, 2024

calebschoepp commented Apr 29, 2024

WIP: OTel metrics #2469

WIP: OTel metrics #2469

Conversation

calebschoepp commented Apr 24, 2024

Design goals

Crimes against humanity

Design questions

lann Apr 25, 2024

Choose a reason for hiding this comment

lann commented Apr 25, 2024 • edited Loading

calebschoepp commented Apr 25, 2024

lann commented Apr 25, 2024

itowlson commented Apr 28, 2024

calebschoepp commented Apr 29, 2024

lann commented Apr 25, 2024 •

edited

Loading