Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationale for logs/annotations instead of spans? #8126

Closed
ecourreges-orange opened this issue Jun 23, 2020 · 30 comments · Fixed by #9891
Closed

Rationale for logs/annotations instead of spans? #8126

ecourreges-orange opened this issue Jun 23, 2020 · 30 comments · Fixed by #9891

Comments

@ecourreges-orange
Copy link

Can you explain the rationale for using logs/annotations (krs... 10 total) instead of 5 additionnal spans?
The performance cannot be considered top priority, as you just need to lower your sampling rate.
Is it visualized similarly in zipkin, i.e. different spans vs multiple annotations?
What about searching for annotations?
We are using Jaeger as backend, and feel it will be hard to investigate any timing issues in kong plugins, I don't think I can search for the "time difference between annotations >= XX milliseconds" anywhere even directly in ES storage of Jaeger.
Maybe you will answer that this is more of a metric type of search.

This is what a kong-zipkin-plugin trace looks like in Jaeger-ui, you have to expand to see them, they do not appear in the timeline, so we were a bit underwhelmed I must say:
L_B8D5 tmp

Is it worth a PR with a configuration choice between multiple spans vs annotations?

Thanks for your answers.
Regards.

@hbagdi
Copy link
Member

hbagdi commented Jun 23, 2020

ping @kikito @seh

@seh
Copy link

seh commented Jun 23, 2020

I've only used the Zipkin output through the OpenTelemetry Collector, sending it to Honeycomb, we get several spans representing Kong's activities, united under a single trace. I'm not sure if those log entries fall out due to not having any representation in OTel.

@ecourreges-orange, can you share another image with those spans collapsed, so that we can see the overall shape of the trace?

@vito-laurenza-zocdoc
Copy link

I'm interested in understanding this rationale as well.

@ecourreges-orange
Copy link
Author

@ecourreges-orange, can you share another image with those spans collapsed, so that we can see the overall shape of the trace?

collapsed

@kikito
Copy link
Member

kikito commented Jun 25, 2020

Spans used to be the way the different phases were stored, but they were changed to annotations as part of a wider sweep of changes in the plugin, requested with the Zipkin devs themselves in Kong/kong-plugin-zipkin#49, with more details in openzipkin/zipkin#2834

The exact place where they recommend the change is:

openzipkin/zipkin#2834 (comment)

I assume this was a performance concern, but for me it didn't make any difference so I just followed their advice.

The performance cannot be considered top priority, as you just need to lower your sampling rate.

I must point out that that solution might be appropriate for you but not for others.

Is it visualized similarly in zipkin, i.e. different spans vs multiple annotations?

This is how it looks in Zipkin:

Screenshot 2020-06-25 at 18 28 38

@vito-laurenza-zocdoc
Copy link

This is what they used to look like in Honeycomb via the OpenTelemetry collector:
image

This is what they look now:
image

@seh
Copy link

seh commented Jun 25, 2020

I guess it's been a while since I tried a more recent version of the plugin, because my view in Honeycomb looks like your first example, @vito-laurenza-zocdoc.

That the span events in the second example don't fall within the parent span's duration bound is surprising, suggesting that these events don't make sense as children.

@seh
Copy link

seh commented Jun 25, 2020

Reading openzipkin/zipkin/pull/2834 today, I think we got this one wrong. They were saying that since of the span durations were so short, we should change them to annotations, because there's a minimum span duration. That's a rather dogmatic prescription, misinterpreting the data that showed these activities completing (in that case) too quickly to measure accurately.

If that requirement for the minimum duration is true, I'd prefer to clamp the minimum durations (report anything shorter than the minimum as the minimum, perhaps with an annotation or something indicating that the duration was clamped) and retain these as spans. They make more sense that way, being activities with a necessary sequence and duration. They are not mere events that occur instantaneously.

@vito-laurenza-zocdoc
Copy link

I guess it's been a while since I tried a more recent version of the plugin, because my view in Honeycomb looks like your first example, @vito-laurenza-zocdoc.

That the span events in the second example don't fall within the parent span's duration bound is surprising, suggesting that these events don't make sense as children.

Even in the first screenshot, you'll see that kong is reporting an access phase latency of ~1ms, but the span for availability-service doesn't start until ~25m relative to the start of the trace. This is very puzzling to me and I'm still not sure if we're actually adding 20-something ms of latency in the access phase, or if there's something else going on.

@vito-laurenza-zocdoc
Copy link

Reading openzipkin/zipkin/pull/2834 today, I think we got this one wrong. They were saying that since of the span durations were so short, we should change them to annotations, because there's a minimum span duration. That's a rather dogmatic prescription, misinterpreting the data that showed these activities completing (in that case) too quickly to measure accurately.

If that requirement for the minimum duration is true, I'd prefer to clamp the minimum durations (report anything shorter than the minimum as the minimum, perhaps with an annotation or something indicating that the duration was clamped) and retain these as spans. They make more sense that way, being activities with a necessary sequence and duration. They are not mere events that occur instantaneously.

For what it's worth, I agree that sending the phase timings as spans makes more sense.

@seh
Copy link

seh commented Jul 7, 2020

@kikito, are there any plans underway to fix this, returning us to the multiple spans that we had before?

@vito-laurenza-zocdoc
Copy link

vito-laurenza-zocdoc commented Jul 28, 2020

@seh @kikito I re-read openzipkin/zipkin/pull/2834 several times and it seems to me that the advice around moving these spans with missing durations to annotations was misguided. The missing durations seem likely to have been caused by the prior plugin versions getting the timestamp precision wrong (milliseconds vs microseconds). With microsecond precision, I doubt we'd even see a span with a duration less than 1 microsecond.

@kikito
Copy link
Member

kikito commented Aug 5, 2020

The suggestions made here make sense. This change would make the plugin non-backwards compatible (nothing would "break", but the stats provided would be very different). So I would need to release a major version. I would like to get feedback from @adriancole @jcchavezs before acting on this.

@codefromthecrypt
Copy link

has anyone asked jaeger to do relative timings to the span? I would be surprised if they would think it is better to display as it is, though possibly there's a perspective I'm missing. cc'ing someone familiar w/jaeger @pavolloffay

@codefromthecrypt
Copy link

one thing I've noticed in tech is that people will move mountains to avoid fixing javascript in a UI :P

@codefromthecrypt
Copy link

on the specific parts here, there seem to have been changes after this one which makes the data more esoteric Kong/kong-plugin-zipkin#52

ex. the three character annotations I'm not sure anyone would understand what they are, and it would be unexpected for a UI to have a schema to know what they are. I would suggest to sortof split up issues.

The first requests to change the data were around semantics and attempts to correct those primarily. There are different ways to model but I would agree that regardless of that UIs probably do need to fix how they present "relative time", 3 character annotations are unusable as would be spans with 3 character names.

First step might be seeing about undoing wherever that 3 char part happened. Regardless of whether 3-10 spans are re-added per request, I wouldn't roll back completely the other parts as there were definitely semantic problems, ex endpoints being incorrect and such.

Focusing on the model we want is probably better than reverting as hours were spent on problems that I'm not sure anyone would want to spend the time to do-over again.

@codefromthecrypt
Copy link

one other thing I would suggest is sometimes translators are used.. I think this problem will be easiest to discuss with actual zipkin json apples/apples as maybe applied to the other 2 UIs. Ex if some proxy is used, the parts besides the 3 letter thing might become obvious. It is weird that some annotations are predating the corresponding spans. JSON is how we usually get to bottom of things very quickly.

@jcchavezs
Copy link
Contributor

I am curious how other UIs impact a zipkin plugin. I do agree the three letters annotation is very hard to digest. The change was introduced in https://github.com/Kong/kong-plugin-zipkin/pull/60/files, not sure what was the motivation (probably following zipkin convention) so @kikito can explain more. I would bring back the previous names as they are more easy to understand,

Regarding using different spans or annotations, the operations performed by kong are sequencial and one depend on the other. Are spans a better modeling for it? I don't think so, IMHO it adds more noise and honestly it is not too interesting see them independently but more like a collections of milestones (but that is personal opinion). Do we need to change the model because rendering in vendor XYZ is bad? I don't think so

@seh
Copy link

seh commented Aug 5, 2020

Are spans a better modeling for it? I don't think so, IMHO it adds more noise and honestly it is not too interesting see them independently but more like a collections of milestones (but that is personal opinion).

I think spans are a better representation for these efforts. Thinking of how these wind up in Honeycomb after passing through the OpenTelemetry Collector, they are more easily queried, visualized, and understood when modeled as spans.

@jcchavezs
Copy link
Contributor

jcchavezs commented Aug 5, 2020 via email

@seh
Copy link

seh commented Aug 5, 2020

As I pointed out above, do we really need to change this plugin because of how honeycomb renders it?

We don't need to do anything, but I think the plugin worked better before, so why not recover that better behavior that users had been enjoying?

I confess that I don't understand the position that represents these Kong efforts as annotations. These efforts are steps that have durations, and we care how long each step takes, and how many of the steps we attempt and accomplish. If we claim that such efforts should be annotations, then what would be left deserving a span?

@ecourreges-orange
Copy link
Author

Apart from the "We don't need to care about other GUIs", explain us how you can investigate/search timing issues in Kong plugins with this annotation model?
I understand this was started as a Zipkin plugin for Kong, but you all know that OpenTelemetry is coming, and it would be sad to have to write a new kong-plugin-opentelemetry where just a few tweaks to this plugin are enough to reach a consensus for zipkin AND non zipkin users.

@kikito
Copy link
Member

kikito commented Aug 7, 2020

I don't want to derail this thread too much so let me address the "three-letter-label" question first.

I am curious how other UIs impact a zipkin plugin. I do agree the three letters annotation is very hard to digest. The change was introduced in Kong/kong-plugin-zipkin#60 (files), not sure what was the motivation (probably following zipkin convention) so @kikito can explain more. I would bring back the previous names as they are more easy to understand,

I confirm that the intention renaming annotations to kas etc. from kong.access.start was aligning the plugin with Zipkin's seemingly preferred style (where cs is used instead of something longer like client.send). In addition to this, the zipkin-api docs, which state:

Usually a short tag indicating an event, like “error”

The impression I got from all that was that the annotation bytesize should be kept constrained. Since I was going to need several annotations per trace, I went with 3-letters.

I see now that this was a mistake. So in a future version of the plugin we will have longer names. Thanks for clarifying this.

@kikito
Copy link
Member

kikito commented Aug 7, 2020

Going back to the main theme of this issue, i.e. "using spans vs using annotations for kong's individual phases", I am getting the impression that using spans would have several advantages:

  • Other tracing software handles spans better than annotations.

they are more easily queried, visualized, and understood when modeled as spans.

While supporting other tracing software is not the highest priority for this plugin, if we can, I think we should (so long as it doesn't impede our capacity for properly supporting Zipkin).

  • It seems to me that even Zipkin itself can handle them better (it can use spanName as a filtering criteria, but not annotationName).

Screenshot 2020-08-07 at 18 50 09

So I am inclined to say that yes, we should go back to using spans for kong phases.

@kikito
Copy link
Member

kikito commented Aug 7, 2020

@seh:

That the span events in the second example don't fall within the parent span's duration bound is surprising, suggesting that these events don't make sense as children.

In older versions of the plugin, timestamps were not correctly transformed into microseconds (#71) so that might explain some discordances. Kong 2.1 (recently released) should not show those.

@kikito
Copy link
Member

kikito commented Aug 7, 2020

Just another heads up: My holidays start now. I told my team to keep an eye on this particular issue, feel free to keep the discussion going.

@seh
Copy link

seh commented Oct 1, 2020

I read the comments accumulated here so far as consensus among the interested parties that we should be using spans to represent Kong's efforts. If that's so, what do we need to do to restore that previous behavior?

@kikito
Copy link
Member

kikito commented Oct 21, 2020

@seh Well if someone feels adventurous enough to try their hand at this, PRs are welcome. Otherwise this will happen when it gets prioritized. We all have been working on the 2.2.0 features recently and plugin features took a backward seat for the moment unfortunately.

@evandam
Copy link

evandam commented Apr 23, 2021

Hey folks, I'm a little late to the party but was wondering if this has been looked at recently. Like others have already mentioned, it's surprising to me how difficult it is to do things like query by the downstream services since everything just rolls up to the service being hard-coded to "Kong." I'm also hoping to send these traces to a separate tool (Instana, but also testing with Jaeger), but in their current format it doesn't seem feasible.

cc @seh @kikito - appreciate the work on this!

@kikito
Copy link
Member

kikito commented May 19, 2021

@evandam I think what you are asking is related but is not exactly this issue, but #8125.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants