-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve support for observability frameworks #665
Comments
I'm broadly in favor of this. It's unfortunate that interceptors can't solve all these use cases, but it's difficult to make them reasonable for low-level, byte-oriented use cases and high-level, struct-oriented use cases. We've spent about a year on otelconnect, and I'm still not thrilled with it. It's missing some important data and it's unreasonably complex - the runtime should do more heavy lifting to make integration with a logging/tracing/metrics backend easier. Introducing a parallel concept specifically for observability (like gRPC's |
Regarding one I think one big benefit of having something like: type Observer struct {
ObserveEventResponseMessage(context.Context, connect.Event)
...
} The observer could opt in/out of specific events they'd like to capture, and depending on implementation, if computing one of the Events is relatively expensive, it can be avoided entirely if there's no callback for a specific event. Whereas with one generic Observer callback, the consumer is responsible for filtering out events it doesn't want or ignoring, but the collection already happened. So fundamental difference to me is the single Observer is a bigger opt-in/opt-out for all observability, whereas a struct with methods gives opt-in/opt-out on a more granular per-event level. Inside connect, you'd have the ability to avoid extraneous time.Now() calls for measuring timing, or allocating an Event struct, etc, if the observer doesn't want that event. |
Meeting notesMet to discuss the above proposal today. SummaryAll in agreement on the need to streamline integration of observability in connect-go. The main discussion point was the addition to the API surface. Two solutions were discussed to mitigate these concerns:
Next Steps
Will update this issue as it progresses. |
Started the proof of concept for the third party package approach (WIP branch here). Changes neededSignature for the observability is updated to handle connect client interceptors. They don't have access to the request header or peer object on initialization. To include the span in the client context we must create the observer before the connection is created. Therefore, a new event type Observability func(context.Context, connect.Spec) (context.Context, Observer)
// EventRequest is emitted when an RPC request is started. The header
// may be modified to propagate information required by other Observers.
type EventRequest struct {
Header http.Header
Peer connect.Peer
} Observer is updated to use @mattrobenolt suggestion to be a set of functions: type Observer struct {
Request func(EventRequest)
Response func(EventResponse)
End func(EventEnd)
} Events on message send and receive are dropped. Difficult to capture message size/encoding/compression without consuming the bytes in a protocol aware way. So instead the count of messages sent and received along with the total size in bytes is recorded on the end message: type EventEnd struct {
Err error // nil if the RPC completed successfully
Trailer http.Header // Trailer metadata
Duration time.Duration
SentTotalBytes int64
ReceivedTotalBytes int64
SentMessageCount int64
ReceivedMessageCount int64
} Start time is difficult to measure correctly for the client (capture the marshalling time). For unary calls the interceptor is called after marshalling and before writing to the round tripper. For streaming we could record the time on the |
Is your feature request related to a problem? Please describe.
Observability plays a critical role in ensuring the reliability, performance, and scalability of services. Presently, while connect-go supports observability through interceptors and HTTP middleware, both solutions have limitations. Interceptors lack specific details like on-the-wire message size and exhibit inconsistent reporting of start times between unary and stream interceptors. On the other hand, while HTTP middleware can offer more precise details, it necessitates redundant protocol parsing to generate comprehensive metrics (such as deducing protocols, parsing envelopes, and handling error messages). Additionally, it fails to capture client-side marshalling time.
Describe the solution you'd like
The objective is to seamlessly integrate observability frameworks and APIs with connect-go. This integration should be framework-agnostic and facilitate the implementation of monitoring, tracing, and logging. It must maintain low overhead and accurately record statistics.
To achieve this, a proposed solution involves introducing an event emitter option called WithObservability to both client and handler options. This option will accept an Observability function, invoked for each new RPC to create a new Observer. These Observers will receive ObserverEvents emitted as the RPC progresses. They may modify the context and headers to propagate essential information to other observers, crucial for tracing frameworks.
Outlined below are the proposed new APIs for Observability:
Events will be emitted to the Observer for important flow conditions:
Retry
(optional): emitted on a client request retries, only to client observer.RequestMessage
: client sends a message and server receives.ResponseMessage
: server sends send a message and client receives.ResponseHeader
: server sends headers, client receives response headers.End
: before return, includes wire error and trailers.Events will be emitted as shown:
A full set of events needs definition. Events contain read only data emitted directly after the event has occurred. A subset:
Describe alternatives you've considered
Additional context
See https://github.com/connectrpc/otelconnect-go for the current interceptor solution.
Example
As an example below is a rough sketch showing the potential implementation of gRPC metrics A66 proposal with a stateful observe method using the go otel library:
The text was updated successfully, but these errors were encountered: