Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: OTel metrics #2469

Closed
wants to merge 2 commits into from
Closed

Conversation

calebschoepp
Copy link
Collaborator

This is a WIP implementation for supporting OTel metrics in Spin. This code is far from complete, but I'm at a point where I want some feedback if I'm going down the right path or not.

Design goals

  • Leverage OTel environment variables for configuring the feature.
  • Have spin_telemetry manage creating and storing the Instruments (the things that record metrics).

I want other packages to be able to do something like spin_telemetry::u64_counter_add("my-counter", 1, my_attributes). The two alternatives to this are unfavourable IMO.

  1. One alternative would be to have every crate in Spin that wants to emit metrics take on the responsibility of creating and managing its own instruments (spin_telemetry would only handle initializing the metrics exporter). This would suck b/c we're adding more state to more crates and then they also all need to import the OTel crates.
  2. Another alternative would be to create and store the instruments in spin_telemetry albeit in a more traditional way. Statically define all the metrics we want and store them in a struct. This is just like we do it in LHC. The problem with this is we couple spin_telemetry to every crate. Now any time you want to add a metric somewhere you have to both add it where you want to emit it and you have to modify spin_telemetry. This is obviously untenable for plugins that want to emit metrics.

Crimes against humanity

I'm doing a lot of potential evil here. Lazy statics with global mutexed maps and declarative macros to generate all the implementations of a function.

I'd like to be scolded or encouraged.

Design questions

  1. How do we feel about my design approach here to decouple crates. Is the macro/global-map shenanigans worth it?
  2. Do we need to support observable instruments @lann ? I don't really get what they are TBH.
  3. I'm finding that all this OTel o11y code is darn near impossible to test. I'd love to be able to write tests that test all possible combinations of OTEL_* env vars, but I don't think we can do that easily b/c the underlying libraries we depend on are what is reading the env. I'm all ears for ways that I could refactor this whole o11y implementation to be more testable.

Signed-off-by: Caleb Schoepp <caleb.schoepp@fermyon.com>
Signed-off-by: Caleb Schoepp <caleb.schoepp@fermyon.com>
@calebschoepp calebschoepp requested review from lann, itowlson and rylev April 24, 2024 23:23
pub(crate) fn otel_enabled() -> bool {
const ENABLING_VARS: &[&str] = &[
pub(crate) fn otel_tracing_enabled() -> bool {
otel_enabled(&[
"OTEL_EXPORTER_OTLP_ENDPOINT",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat of a tangent: these are (all?) available as consts from opentelemetry-otlp, e.g. https://docs.rs/opentelemetry-otlp/0.15.0/opentelemetry_otlp/constant.OTEL_EXPORTER_OTLP_ENDPOINT.html

It can be nice to use constants even when they are strings that match the name exactly; it both prevents typos and acts as implicit documentation that this is a standard.

@lann
Copy link
Collaborator

lann commented Apr 25, 2024

This is another option: https://docs.rs/tracing-opentelemetry/latest/tracing_opentelemetry/struct.MetricsLayer.html

Note that this does a map lookup per metric update so there might be a caveat there for very high frequency metrics.

@calebschoepp
Copy link
Collaborator Author

This is another option: https://docs.rs/tracing-opentelemetry/latest/tracing_opentelemetry/struct.MetricsLayer.html

Note that this does a map lookup per metric update so there might be a caveat there for very high frequency metrics.

Oh cool, I wasn't aware of this library. This is effectively the exact same implementation approach I took — storing the metric instruments in a global RwLock<HashMap<>> — so it would have a similar performance profile as my work. I'll play around with this b/c it would nice to just have a library that implements all this rather than having to maintain my custom code to do all this.

Notably it doesn't support observable counters or gauges.

@lann could you please respond to my other questions?

@lann
Copy link
Collaborator

lann commented Apr 25, 2024

How do we feel about my design approach here to decouple crates. Is the macro/global-map shenanigans worth it?

I think it makes sense for metrics. At least, I don't have a better answer.

Do we need to support observable instruments @lann ? I don't really get what they are TBH.

I don't either. 🤷

I'm finding that all this OTel o11y code is darn near impossible to test.

I don't have much insight here. Yep, its hard to test. Honestly I mostly just don't, because the cost of a failure in observability doesn't generally seem very high compared to the cost of testing it. 🤷 again

@itowlson
Copy link
Collaborator

If #2475 supersedes this, should we close this @calebschoepp?

@calebschoepp
Copy link
Collaborator Author

If #2475 supersedes this, should we close this @calebschoepp?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants