Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving observability of Spin #2293

Closed
calebschoepp opened this issue Feb 23, 2024 · 10 comments
Closed

Improving observability of Spin #2293

calebschoepp opened this issue Feb 23, 2024 · 10 comments

Comments

@calebschoepp
Copy link
Collaborator

Observability is critical for a great developer experience. We should work to improve the observability of Spin, but that is a very vague statement. What exactly are we improving the observability of? Spin itself? Spin apps?

This issue is meant to act as a meta-issue that clarifies what we mean by "improving observability of Spin". It will provide a lay of the land by describing the different levels of observability within Spin that we want to improve. Other issues, SIPs, and PRs will be used to track the actual work of improving the observability and they can backlink to this meta-issue.

Before we dive in I want to note that OpenTelemetry has become the industry standard for observability data and is the standard we would want to conform to.

Types of observability in Spin

I propose that there are four types of observability in Spin that we want to enable. They exist on a spectrum from host-focused to guest-focused.

1) Runtime observability — observing the Spin runtime itself

Developers operating Spin in a production environment want observability into the state of the Spin process itself. This would include among other things:

  • Emit spans for any critical background work e.g. garbage collection.
  • Emit metrics on things like connection pool sizes.
  • Potentially emit Spin logs in an OTEL format?

Some notable non-requirements include:

  • Emitting metrics around CPU/memory usage. These should be collected via an agent on the node.
  • Emitting metrics around aggregate stats of things like request or error count. These should be created by aggregating the request observability data downstream.

2) Trigger observability — observing the requests made to Spin applications

Developers want observability into the requests that are made to their Spin application. This would include among other things:

  • Emit span when a Spin application is triggered with metadata about the trigger event.
  • Support trace context propagation from incoming headers and pass trace context propagation on outbound calls. Provide configuration in spin.toml to enable or disable this for security reasons.
  • Emit metrics about Spin application trigger events.
  • Potentially emit Spin application logs in an OTEL format?

3) Component observability — observing the interaction between composed components

Developers will create their Spin applications from a composition of components. Ideally we can automatically emit spans as the component composition graph is traversed and components are executed. This would include among other things:

  • Emit span when each Wasm component is executed.
  • Support trace contexts between each component.

This would require upstream modifications in Wasmtime.

4) Guest observability — observing the code within the guest module

Developers want to be able to instrument their own guest code. This allows them to emit telemetry with spans, metadata, and metrics unique to their own use case. We are reliant on the upstream WASI Observe proposal to make this happen. The upstream proposal has the clearest definition of requirements, but briefly for Spin to act as a host implementation we would require:

  • A host component that satisfies the WASI Observe WIT interface.
  • Potentially modifications to the Spin SDKs to support emitting telemetry (metrics, spans, etc.)?

Other observability related things

Here are some other observability related things we might want to do to make the experience better in Spin.

Streamline the process of collecting and viewing the observability data

The four types of observability outlined in the above section all just emit telemetry and expect that there is a collector running somewhere to collect the data. It would be good clearly document the process of running a collector for any users who don't already use a specific collector in their environment.

We could take this one step further if we wanted and build this collector into Spin (or a plugin or an app like KV explorer) if we really wanted to streamline the experience.

Create an observability standard that other Spin runtimes can match

Spin is not the only Spin runtime. Observability should be implemented into Spin such that other Spin runtimes can follow suit too.

Prior art

@calebschoepp
Copy link
Collaborator Author

Here is an example of what a trace might look when levels 2 through 4 are combined.

Untitled (1)

@calebschoepp
Copy link
Collaborator Author

Trigger observability seems like the most tractable and immediately problem so I'm going to get started on a SIP for how we could implement it.

@macolso
Copy link

macolso commented Feb 27, 2024

Question for my own understanding: is CPU / memory utilization considered a runtime or guest metric? For example, Azure Application Insights emits a metric called Process CPU, which shows how much of the total processor capacity is consumed by the process that is hosting your monitored app. I would consider Application Insights a tool for guest observability so this seems like a grey area.

@calebschoepp
Copy link
Collaborator Author

Question for my own understanding: is CPU / memory utilization considered a runtime or guest metric? For example, Azure Application Insights emits a metric called Process CPU, which shows how much of the total processor capacity is consumed by the process that is hosting your monitored app. I would consider Application Insights a tool for guest observability so this seems like a grey area.

I suppose it could be considered both. We might want to emit CPU/Memory utilization from the trigger observability i.e. how much CPU/Memory did an invocation of an app use. This would be considered guest metrics. Someone could also use an agent on the node to collect the CPU/Memory utilization of Spin itself and this would be a runtime metric.

I'm not really sure if this answers your question though because your question seems specific to the semantics of App Insights which I'm not really familiar with.

@calebschoepp
Copy link
Collaborator Author

calebschoepp commented Feb 27, 2024

@rylev had a good suggestion that we should make sure to clearly document our patterns around spans e.g. how do we name them, what metadata do they have, when should we emit them. That way the traces that get created can be more consistent and useful.

https://github.com/open-telemetry/opentelemetry-specification/blob/v1.26.0/specification/trace/api.md#span

@vdice vdice moved this to 🆕 Triage Needed in Spin Triage Mar 13, 2024
@vdice vdice moved this from 🆕 Triage Needed to 🏗 In progress in Spin Triage Mar 13, 2024
@calebschoepp
Copy link
Collaborator Author

Seeing as this is a meta-issue tracking a lot of work I'm wondering if it shouldn't be set in progress. @vdice what do you think?

@vdice vdice moved this from 🏗 In progress to 🔖 Backlog in Spin Triage Mar 13, 2024
@vdice
Copy link
Contributor

vdice commented Mar 13, 2024

@calebschoepp 👍 Sounds good. Thanks!

@agardnerIT
Copy link

agardnerIT commented Mar 29, 2024

As someone with Observability experience and a CNCF ambassador, please LMK if I can assist here. I am happy to act in a vendor-neutral consultant role.

@lann
Copy link
Collaborator

lann commented Mar 29, 2024

@agardnerIT Thanks! The most recent work in progress is at #2398 if you are interested in following.

@calebschoepp
Copy link
Collaborator Author

This work is sufficiently far along that I'm closing this initial ticket.

@github-project-automation github-project-automation bot moved this from 🔖 Backlog to ✅ Done in Spin Triage Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants