-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thousand Island instrumentation #136
Comments
Note that thousand island's telemetry model is being overhauled to better accommodate exactly this scenario. See provisional work at mtrudel/thousand_island#27 (which itself is largely blocked by knowing what the golden path is for exactly this integration). |
I'm not sure how instrumentation of the socket level can actually be correlated with the application level. Any span created in ThousandIsland will be ignored when the HTTP request's propagated context is extracted. We'll want to create metrics from these events, of course, but I don't think they can be used for spans. |
My really early take on this at the HTTP layer is up at https://github.com/mtrudel/bandit/tree/telemetry; generally I was planning to
Truthfully, I'm not at all confident I'm not overbuilding this. The time lookups alone are pretty expensive to be doing so many of them. Happy to take guidance / advice from folks more versed in telemetry than I. |
Yea, that is an issue since Bandit, or Plug/Phoenix, needs to create a span from a parent that is received in request headers. It'll have its own trace id, so can't have a parent/child relationship with anything from ThousandIsland. The only option I know of for such a relationship would be that Bandit could add the ThousandIsland span as a linked span if it is what starts the span from the propagated headers. |
Interesting! Alright, so if I understand correctly:
If that's the case, then I think the only open questions that remain for me are those at mtrudel/thousand_island#27 (comment). Pending clarity on them, I should be able to land sensible telemetry support in Bandit / TI in short order, which would unblock this. In light of that, do you have any advice for those questions, specifically how I can make your life here as easy as possible? |
Hey, Mat. If you have a look at how cowboy/phoenix telemetry/otel integration works, we need something that kicks off and initiates at the socket level and then end at that same level so we get accurate total times. For cowboy + phoenix (unreleased) we're just continuing to add info the a span started in otel cowboy. The same would be true with bandit/thousand island. I only briefly looked at what was available telemetry wise several weeks ago and only saw thousand island events which is why I started this issue. If it belongs in bandit, totally cool. The key thing we'd all want here is to start and stop a span as closely to or in the socket request handler as possible. For cowboy, there's a connection handler pool where the span gets started and then we're reaching into that process to get pull that trace context into the phoenix process and just extend it. Basically need the equivalent of otel cowboy for thousand island/bandit |
Important to note it is started somewhere that has the headers, https://github.com/open-telemetry/opentelemetry-erlang-contrib/blob/main/instrumentation/opentelemetry_cowboy/src/opentelemetry_cowboy.erl#L32 which is why i assume it needs to be in Bandit. |
I‘m currently in the process of moving from cowboy to bandit for my phoenix setup. Is there anything that’s blocking this? |
Not on my side. Bandit's telemetry payloads have been solid since 1.0. There are gonna be a few minor changes to metadata soon but nothing substantial |
There are some developments here but I have not been able to get a response. I'd like to get things moved forward and build on the work they did since they've used the namespace now rather than having to start from scratch to have a more complete lib. |
@mtrudel when looking through the bandit code and the instrumenter here that was written, there's a critical change that we need to get into bandit. Currently, the start event on a request happens prior to the request being read. The full request is not available until the end which makes it too late to extract trace propagation headers and make sampling decisions which must happen at the start of the trace, leaving incomplete context for the rest of the request. There are a couple of things that we need to change to get parity with the cowboy instrumentation.
|
That should be doable! Any other metadata you want in the start event? |
I think that should be all that's needed. I feel like that object looked pretty thorough when I perused it. As a reference point, cowboy shows the bare minimum we'd like to see. https://github.com/open-telemetry/opentelemetry-erlang-contrib/blob/main/instrumentation/opentelemetry_cowboy/src/opentelemetry_cowboy.erl#L31-L50 |
This is actually proving to be a bit more of a plumbing change than I was expecting. I'm planning on landing the refactors in mtrudel/bandit#297 et al during my company hack week later on this month; I'll make sure these changes get landed as part of that work |
Thanks for the update. I had seen you started that PR and the refactor looked like it was going to make it more difficult than just sliding down to here |
The main problem is that the By the end of that work there's going to be a single consolidated pipeline for HTTP request/responses, and HTTP1/2 are going to be thin behaviour implementations that only know about the lowest levels of transport. All the shared HTTP semantics (including telemetry!) will be shared between them, so there will only be one place to change this sort of thing. Expect an update early in the week of Mar 25 |
Bandit Let me know if anything else is needed! |
With Bandit now having full support for being a drop-in cowboy replacement for Phoenix, we'll need an instrumentation library for Thousand Island from its telemetry.
The text was updated successfully, but these errors were encountered: