-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationale for logs/annotations instead of spans? #8126
Comments
I've only used the Zipkin output through the OpenTelemetry Collector, sending it to Honeycomb, we get several spans representing Kong's activities, united under a single trace. I'm not sure if those log entries fall out due to not having any representation in OTel. @ecourreges-orange, can you share another image with those spans collapsed, so that we can see the overall shape of the trace? |
I'm interested in understanding this rationale as well. |
|
Spans used to be the way the different phases were stored, but they were changed to annotations as part of a wider sweep of changes in the plugin, requested with the Zipkin devs themselves in Kong/kong-plugin-zipkin#49, with more details in openzipkin/zipkin#2834 The exact place where they recommend the change is: openzipkin/zipkin#2834 (comment) I assume this was a performance concern, but for me it didn't make any difference so I just followed their advice.
I must point out that that solution might be appropriate for you but not for others.
This is how it looks in Zipkin: |
I guess it's been a while since I tried a more recent version of the plugin, because my view in Honeycomb looks like your first example, @vito-laurenza-zocdoc. That the span events in the second example don't fall within the parent span's duration bound is surprising, suggesting that these events don't make sense as children. |
Reading openzipkin/zipkin/pull/2834 today, I think we got this one wrong. They were saying that since of the span durations were so short, we should change them to annotations, because there's a minimum span duration. That's a rather dogmatic prescription, misinterpreting the data that showed these activities completing (in that case) too quickly to measure accurately. If that requirement for the minimum duration is true, I'd prefer to clamp the minimum durations (report anything shorter than the minimum as the minimum, perhaps with an annotation or something indicating that the duration was clamped) and retain these as spans. They make more sense that way, being activities with a necessary sequence and duration. They are not mere events that occur instantaneously. |
Even in the first screenshot, you'll see that kong is reporting an access phase latency of ~1ms, but the span for availability-service doesn't start until ~25m relative to the start of the trace. This is very puzzling to me and I'm still not sure if we're actually adding 20-something ms of latency in the access phase, or if there's something else going on. |
For what it's worth, I agree that sending the phase timings as spans makes more sense. |
@kikito, are there any plans underway to fix this, returning us to the multiple spans that we had before? |
@seh @kikito I re-read openzipkin/zipkin/pull/2834 several times and it seems to me that the advice around moving these spans with missing durations to annotations was misguided. The missing durations seem likely to have been caused by the prior plugin versions getting the timestamp precision wrong (milliseconds vs microseconds). With microsecond precision, I doubt we'd even see a span with a duration less than 1 microsecond. |
The suggestions made here make sense. This change would make the plugin non-backwards compatible (nothing would "break", but the stats provided would be very different). So I would need to release a major version. I would like to get feedback from @adriancole @jcchavezs before acting on this. |
has anyone asked jaeger to do relative timings to the span? I would be surprised if they would think it is better to display as it is, though possibly there's a perspective I'm missing. cc'ing someone familiar w/jaeger @pavolloffay |
one thing I've noticed in tech is that people will move mountains to avoid fixing javascript in a UI :P |
on the specific parts here, there seem to have been changes after this one which makes the data more esoteric Kong/kong-plugin-zipkin#52 ex. the three character annotations I'm not sure anyone would understand what they are, and it would be unexpected for a UI to have a schema to know what they are. I would suggest to sortof split up issues. The first requests to change the data were around semantics and attempts to correct those primarily. There are different ways to model but I would agree that regardless of that UIs probably do need to fix how they present "relative time", 3 character annotations are unusable as would be spans with 3 character names. First step might be seeing about undoing wherever that 3 char part happened. Regardless of whether 3-10 spans are re-added per request, I wouldn't roll back completely the other parts as there were definitely semantic problems, ex endpoints being incorrect and such. Focusing on the model we want is probably better than reverting as hours were spent on problems that I'm not sure anyone would want to spend the time to do-over again. |
one other thing I would suggest is sometimes translators are used.. I think this problem will be easiest to discuss with actual zipkin json apples/apples as maybe applied to the other 2 UIs. Ex if some proxy is used, the parts besides the 3 letter thing might become obvious. It is weird that some annotations are predating the corresponding spans. JSON is how we usually get to bottom of things very quickly. |
I am curious how other UIs impact a zipkin plugin. I do agree the three letters annotation is very hard to digest. The change was introduced in https://github.com/Kong/kong-plugin-zipkin/pull/60/files, not sure what was the motivation (probably following zipkin convention) so @kikito can explain more. I would bring back the previous names as they are more easy to understand, Regarding using different spans or annotations, the operations performed by kong are sequencial and one depend on the other. Are spans a better modeling for it? I don't think so, IMHO it adds more noise and honestly it is not too interesting see them independently but more like a collections of milestones (but that is personal opinion). Do we need to change the model because rendering in vendor XYZ is bad? I don't think so |
I think spans are a better representation for these efforts. Thinking of how these wind up in Honeycomb after passing through the OpenTelemetry Collector, they are more easily queried, visualized, and understood when modeled as spans. |
As I pointed out above, do we really need to change this plugin because of
how honeycomb renders it?
ons. 5. aug. 2020, 14:43 skrev Steven E. Harris <notifications@github.com>:
… Are spans a better modeling for it? I don't think so, IMHO it adds more
noise and honestly it is not too interesting see them independently but
more like a collections of milestones (but that is personal opinion).
I think spans are a better representation for these efforts. Thinking of
how these wind up in Honeycomb after passing through the OpenTelemetry
Collector, they are more easily queried, visualized, and understood when
modeled as spans.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/Kong/kong-plugin-zipkin/issues/89#issuecomment-669170859>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXOYARQZL27AA6BMLMR3MLR7FHWPANCNFSM4OFWWBLA>
.
|
We don't need to do anything, but I think the plugin worked better before, so why not recover that better behavior that users had been enjoying? I confess that I don't understand the position that represents these Kong efforts as annotations. These efforts are steps that have durations, and we care how long each step takes, and how many of the steps we attempt and accomplish. If we claim that such efforts should be annotations, then what would be left deserving a span? |
Apart from the "We don't need to care about other GUIs", explain us how you can investigate/search timing issues in Kong plugins with this annotation model? |
I don't want to derail this thread too much so let me address the "three-letter-label" question first.
I confirm that the intention renaming annotations to
The impression I got from all that was that the annotation bytesize should be kept constrained. Since I was going to need several annotations per trace, I went with 3-letters. I see now that this was a mistake. So in a future version of the plugin we will have longer names. Thanks for clarifying this. |
Going back to the main theme of this issue, i.e. "using spans vs using annotations for kong's individual phases", I am getting the impression that using spans would have several advantages:
While supporting other tracing software is not the highest priority for this plugin, if we can, I think we should (so long as it doesn't impede our capacity for properly supporting Zipkin).
So I am inclined to say that yes, we should go back to using spans for kong phases. |
@seh:
In older versions of the plugin, timestamps were not correctly transformed into microseconds (#71) so that might explain some discordances. Kong 2.1 (recently released) should not show those. |
Just another heads up: My holidays start now. I told my team to keep an eye on this particular issue, feel free to keep the discussion going. |
I read the comments accumulated here so far as consensus among the interested parties that we should be using spans to represent Kong's efforts. If that's so, what do we need to do to restore that previous behavior? |
@seh Well if someone feels adventurous enough to try their hand at this, PRs are welcome. Otherwise this will happen when it gets prioritized. We all have been working on the 2.2.0 features recently and plugin features took a backward seat for the moment unfortunately. |
Hey folks, I'm a little late to the party but was wondering if this has been looked at recently. Like others have already mentioned, it's surprising to me how difficult it is to do things like query by the downstream services since everything just rolls up to the service being hard-coded to "Kong." I'm also hoping to send these traces to a separate tool (Instana, but also testing with Jaeger), but in their current format it doesn't seem feasible. |
Can you explain the rationale for using logs/annotations (krs... 10 total) instead of 5 additionnal spans?
The performance cannot be considered top priority, as you just need to lower your sampling rate.
Is it visualized similarly in zipkin, i.e. different spans vs multiple annotations?
What about searching for annotations?
We are using Jaeger as backend, and feel it will be hard to investigate any timing issues in kong plugins, I don't think I can search for the "time difference between annotations >= XX milliseconds" anywhere even directly in ES storage of Jaeger.
Maybe you will answer that this is more of a metric type of search.
This is what a kong-zipkin-plugin trace looks like in Jaeger-ui, you have to expand to see them, they do not appear in the timeline, so we were a bit underwhelmed I must say:
![L_B8D5 tmp](https://user-images.githubusercontent.com/6293141/85414179-1284a600-b56c-11ea-9bc7-d1133cb3caf9.PNG)
Is it worth a PR with a configuration choice between multiple spans vs annotations?
Thanks for your answers.
Regards.
The text was updated successfully, but these errors were encountered: