-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bunny instrumentation is not propagating contexts #526
Comments
Another thing, because it is probably not clear in the original message, I think the context propagation bug is on the consumer side of the bunny instrumentation. |
Just fyi, I edited the original post to add an example repository to reproduce the problems described in the issue: |
And since GitHub duscussions and issues do not share references automatically, at least 2 other people mentioned the similar prolem here: open-telemetry/opentelemetry-ruby#1098 (I mentioned this thread in the original message. It's just so easy to miss that among other things) |
^ I have been thinking about this assertion in the discussion ^. I could not find why this decision was there to follow this specific semantics and not others in the first PR that contributed the gem. So, from the people who understand the OTEL specs and implementations way better than I, could anyone please share scenarios where not adhering to this particular batch-receiving semantincs would cause me a problem for this particular example I shared here: https://github.com/mugli/otel-ctx-prop-amqp-pubsub. This is a very simple demo, most parts are even directly taken from the basic RabbitMQ tutorials. If it's not a problem to propagate everything (including the same trace-id), can I create a PR attempting to fix this for the bunny instrumentation? At the same time, it'll allow I think the It'd be really great if someone can verify that it's alright to do it the way |
Just my 2 cents, as recently I've been surprised and struggling with this as well. AFAIK I would expect the spans created by the reception and processing of that particular message to have the context propagated from the publisher. Prior to migrating to OTEL, in my company we were actually propagating our own "trace id" via headers, as there's a clear 1:1 relationship between the publication and reception of a single message with It turns out, to add to the discussion, that we're implementing buffering on top of |
One more thing to consider: the Sidekiq instrumentation offers the possibility to propagate contexts or link traces via the propagation_style config option. Not being a spec expert myself by any means, I interpreted that as "ok, maybe in some systems propagating contexts result in huge traces, so people can choose to "cut the trace" and just use links"... And I thought in the case of Now, from the discussion, it seems the decision was actually based on semantics. I reiterate that I don't think we're dealing with batch receiving in Bunny. And now I wonder why the choice is offered for Sidekiq, since vanilla Sidekiq doesn't even offer batching, but that's another issue (if it's an issue at all). In any case, leaving semantics aside, could we leave it to the user to choose the propagation style for Bunny, as it's already implemented for Sidekiq? |
I'll get oriented on this propagating bunnies issue and see what I can do. |
An OTEP pertinent to this problem ought to merge into the spec in a few days. (I need to catch up on reading that.) |
Thanks @robbkidd So when the OTEP gets merged, this is going to be the final convention for the scenario we're discussing:
The current Bunny instrumentation will meet the SHOULD part without any modification, as it's already using links. Hopefully we can consider the MAY part, making the "Deliver" span a child of the corresponding "Publish"/"Create" span...? (As shown in the OTEP examples) I can see it's not an obvious decision, though. The trade-off is expressed in the OTEP itself:
And:
So maybe this can be left as a configuration option? Off by default and opt-in if you want the relationship added, or on by default and opt-out if the traces in your particular case end up being too big. Glad to write some code (possibly taking the Sidekiq instrumentation as model), do some tests and open a MR if you guys don't have the bandwidth. |
…emetry#526) open-telemetry/oteps#220 recommends: > For each message it accounts for, the "Deliver" or "Receive" span SHOULD link to the message's creation context. In addition, if it is possible the creation context MAY be set as a parent of the "Deliver" or "Receive" span. But also acknolwedges: > open-telemetry/opentelemetry-specification#454 To instrument pull-based "Receive" operations as described in this document, it is necessary to add links to spans after those spans were created. The reason for this is, that not all messages are present at the start of a "Receive" operations, so links to related contexts cannot be added at the start of the span. Our "Receive" spans do not link to the message's creation context, and indeed it doesn't seem possible to do so. Likewise for setting the parent. This tries to do it anyway, and results in: - "Receive" span links remain unset; - "Receive" span is a root span for a new trace; - "Process" span trace_id is correctly set to the "Send" context; - "Process" span's parent is incorrectly set to "Send" instead of "Receive".
i looked at this a bit today, attempting to get an implementation that supports the SHOULD and MAY mentioned in #526 (comment) i've written up the issues i ran into in a commit message along with my best-effort implementation: open-telemetry:opentelemetry-ruby-contrib:main...no-reply:opentelemetry-ruby-contrib:bunny-prop importantly, it's not true that the current implementation meets the SHOULD. instead, it adds the recommended links to the "Process" span. as described in the OTEP, it's probably not possible to add them to the "Receive" span without specification changes; the needed headers are fetched during that span. i don't see a clear way forward at this point. we could always add a configuration flag, following the if we can decide on a direction, i'm motivated to get an implementation. as it stands, i'm feeling pretty stuck. Edit: it was easy enough to work up a version of this that does the thing i'm convinced isn't a good idea, so here: main...no-reply:bunny-prop-2 lastly, the |
…emetry#526) open-telemetry/oteps#220 recommends: > For each message it accounts for, the "Deliver" or "Receive" span SHOULD link to the message's creation context. In addition, if it is possible the creation context MAY be set as a parent of the "Deliver" or "Receive" span. But also acknolwedges: > open-telemetry/opentelemetry-specification#454 To instrument pull-based "Receive" operations as described in this document, it is necessary to add links to spans after those spans were created. The reason for this is, that not all messages are present at the start of a "Receive" operations, so links to related contexts cannot be added at the start of the span. Our "Receive" spans do not link to the message's creation context, and indeed it doesn't seem possible to do so. Likewise for setting the parent. In lieu of a specification layer solution, this proposes adding a configuration flag so that users can opt-in to setting the "Send" context as parent for the "Process" span. This comes at the cost of a continuous trace including the "Send"-"Receive"-"Process" spans, with "Receive" orphaned from both "Send" and "Process". This doesn't attempt to reproduce the "none" `propagation_style` included in `sidekiq` instrumentation and other similar gems. If this is an interesting direction, we could consider implementing it.
Actually, AMQP/ Coincidentally, I was going to open a new issue to discuss the What do you guys think it's the best place to discuss the whole situation? This issue, a new one, some other place/mechanism better suited for discussing "big changes"...? Edit: @no-reply If we manage to make a |
i think it depends on details about the new implementation, but my guess is it won't be any easier. the structure of the problem is that we need to open the "Deliver"/"Receive" span before delivery/reception actually takes place, but won't have the data necessary to link and/or propagate context until that process is substantially completed. the larger reorganization you're suggesting makes sense to me (having only recently started looking at this library seriously), but i think it probably doesn't change the ground for this discussion either way. |
I will leave this decision to you all. Please let the @open-telemetry/ruby-contrib-approvers know how you intend on collaborating and we will support you all with feedback on design and in code reviews. Also, if you all are able to join the community SIG meeting to discuss any details via zoom you are most welcome! Please check the OTel Community Calendar for meeting times. Thank you very much to @no-reply @bolandoboy @mugli for your willingness to collaborate and help us improve this instrumentation! |
Let me play this weekend a bit (yes, I have strange hobbies :-D), because I'm almost sure that at the point where messages are delivered to application code, we can access the message headers where the context propagated from the producer comes (IIRC). Hopefully I can come back with an initial implementation, and then we can follow the steps to move it through the channels @arielvalentin mentioned above. (And if not, I'll leave it to more knowledgeable contributors.) |
@bolandoboy "But if it be a sin to covet |
sounds good @bolandoboy.
thanks @arielvalentin! i've added this to my calendar and will try to make it next week, if only to say hi. |
Finally managed to move some code around and get preliminary results. (Slight tangent: I couldn't get the example in the repo to export traces to the collector + Jaeger. Started getting I changed the example to use Removing the
The other option, which currently I toggle very dirtily commenting/uncommenting a particular line, but could be easily configurable, is to make the The single
Code is dirty and sitting in my machine. I haven't even ran the tests and I must have broken most of them. If somebody is interested I can push the changes, but I think the point was seeing that this can be done, and then decide if we want to go down this route. I'll try to attend the next SIG meeting. Thanks @arielvalentin!!! |
@no-reply In this branch there's some code that tentatively meets the future semantic conventions from OTEP 220: https://github.com/bolandoboy/opentelemetry-ruby-contrib/tree/otep-0220-messaging-semantic-conventions-span-structure Changes with respect to the current version:
This is a draft. Things left for even considering releasing this (apart from the OTEP being merge in the spec):
|
👋 This issue has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the |
added the keep label here. Seems like open question still, unless this should've been closed per some context from sig mtgs or is better tracked via another issue. |
Did this manage to get through allowing for a consumer trace to be part of the parent deliver context? Reached this point after reading the equivalent option of sidekiq, which seems pretty handy if you want to have all spans in a single view/graph |
Description of the bug
The doc mentions (link):
And,
This means we expect the
baggage
header (among others) to propagate from RabbitMQ publishers to consumers when we use the bunny instrumentation.However, that is not happening right now. Instead of propagating any contexts (including baggage), the bunny instrumentation drops all contexts in the consumer and starts a new trace.
I got to know from this discussion that bunny instrumentation is trying to follow batch messages receive semantics, that's why it creates new traces and then links them. The initial implementation was also heavily inspired by the ruby_kafa instrumentation.
I don't have an opinion if that's the right way to do this or not. I just checked the AMQP instrumentation in Node.js and found that it doesn't do it like this, it starts a new span from the parent context and propagates context properly.
However, regardless of which message semantic it chooses to follow, I think that not propagating contexts is a bug in the bunny instrumentation.
Additional details:
I believe this is the place where the context is getting lost: https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/1990634acece4b32f56663a720e18b27a14a2b25/instrumentation/bunny/lib/opentelemetry/instrumentation/bunny/patch_helpers.rb#L35C18-L45
Because here we are creating
tracer_receive_headers
from scratch instead of taking the values from relevant message headers:opentelemetry-ruby-contrib/instrumentation/bunny/lib/opentelemetry/instrumentation/bunny/patch_helpers.rb
Lines 47 to 64 in 1990634
Share a simplified reproduction if possible
I've created an example repository to reproduce the problems described in the issue:
https://github.com/mugli/otel-ctx-prop-amqp-pubsub
The text was updated successfully, but these errors were encountered: