-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document special Zipkin conversion cases #566
Conversation
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should be reviewed by Zipkin maintainers, I would open a ticket there asking them to confirm the clean-up rules (because Zipkin server does something similar).
@adriancole if you don't mind, can you please help us here? |
Conversion to v1 format is full of nuance. do you actually support thrift? If not, possibly less distracting to discuss nuance of v2. If you are discussing v1, I would put it in a very clear section that this is discussion of a deprecated format. let me know which and we can continue. |
I think this is about accepting v1 format. There's little reason for otel collector to be generating v1. |
ok then maybe put this under a section heading. I have some things I have to do but will review more tomorrow Here's a spike for that heading: Considerations for Legacy (v1) FormatZipkin's v2 json format was defined in 2017, followed up by a protobuf format in 2018. Frameworks made before then use a more complex v1 Thrift or json format that notably differs in so far as it uses terminology such as Binary Annotation, and repeats endpoint information on each attribute. Below is the conversion process of v1 format into OpenTelemetry format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that things like missing duration may not happen on some SDKs but not a bad idea to document it
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
|
||
Frameworks made before then use a more complex v1 [Thrift](https://github.com/openzipkin/zipkin-api/blob/master/thrift/zipkinCore.thrift) or [json](https://github.com/openzipkin/zipkin-api/blob/master/zipkin-api.yaml) format that notably differs in so far as it uses terminology such as Binary Annotation, and repeats endpoint information on each attribute. | ||
|
||
Below is the conversion process of v1 format into OpenTelemetry format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document will become very long, even timestamps is complicated. Also a v1 span can have multiple spans inside of it. is there any code in this org that does this? Meanwhile should you want to try, you can take notes and heuristics from here:
personally, I'm quite behind (not just usual stuff but also covering for someone who's on paternity). I can't pick up more work load, and reviewing this until it is completely correct will be a big job. This time would be better spent on getting v2 conversion correct, or figuring out why people continue using v1 (probably envoy), and solving that!
It seems subtle, but even timestamp logic below isn't quite right, and I'd prefer not writing anything vs writing something not correct. In worst case, you can use words like "loosely" as while the heuristics listed below aren't wrong, they aren't completely correct.
You can possibly say this section is a work in progress, and you can look at the authoritative conversion logic I pasted for final answers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be a bad idea to simply refer to V1SpanConverter.java
as a reference implementation. What's the point of translating code into prose so that someone can later translate it back into code? Zipkin server's code is always going to be the ultimate authority on how to translate from V1 to V2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the link to the V1SpanConverter.java
.
Instead of writing the conversion procedure we can list the cases that require special attention. This should help people to understand what requires additional handling and for more information they can dig into Zipkin converter.
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
|
||
Consider using [V1SpanConverter.java](https://github.com/openzipkin/zipkin/blob/master/zipkin/src/main/java/zipkin2/v1/V1SpanConverter.java) as a reference implementation for converting v1 model to OpenTelemetry. | ||
|
||
Follows a list of cases that require special handling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no harm in listing these, of course, but not much benefit either imo - if people do look at V1SpanConverter code, those cases would be handled there already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS even in v2 these cases can happen. For example, people reporting data in chunks.
The what's notably different is that in v1, these might be resolvable at ingest time using annotations. If we are going to say anything we should probably say more specific, or not say anything as it is misleading to say missing timestamp happens here (it implies it can't happen otherwise)
Ex.
Follows a list of cases that require special handling: | |
The span timestamp and duration were late additions to the V1 format. As in the code link above, it is possible to heuristically derive them from annotations. |
It is things like this, that even doing this part will take some back and forth to get the words leading in the right direction, why I mention it is hard doc labor. What I pasted is ok if you remove the other text here. Just I would prefer you use this as one example as opposed to hinting that this doc will enumerate all things, such as the bullets below do.
I know intentions are good, just let's be careful not to dig a hole, basically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no harm in listing these, of course, but not much benefit either imo - if people do look at V1SpanConverter code, those cases would be handled there already.
From glancing at the java code it might not be obvious that timestamps/duration might be missing.
You are right there are probably several heuristics that can be applied during the conversion. For the OTEL scope we are probably interested in those that can happen at ingestion time.
I know only a small subset from what we do in Zipkin->Jaeger converter. I have applied your change. The goal here is to document that special cases exist and if it is convenient document which ones.
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this, @pavolloffay!
* Document special Zipkin conversion cases Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Fix typos Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Fix lint Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Change heading and add explanation for special cases Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Reference zipkin v1 converter Signed-off-by: Pavol Loffay <ploffay@redhat.com> * Fix review comments Signed-off-by: Pavol Loffay <ploffay@redhat.com>
This proposal is mostly related to OTEL collector as I assume it is the only one collecting Zipkin data in thrift format. I am not sure if this could also happen in JSON or proto (probably not).
In Jaeger we have a set of span sanitizers https://github.com/jaegertracing/jaeger/blob/master/cmd/collector/app/sanitizer/zipkin/span_sanitizer.go that properly set some span fields if the values are missing. I think it might be worth documenting these as we saw that old Zipkin instrumentations/clients might send spans with missing fields (start time, duration).
The only missing sanitizer is for span errors. Errors in Zipkin spans might have the following format
error=some error
. This will have to also be transformed to OTEL format. It is blocked by #427.Signed-off-by: Pavol Loffay ploffay@redhat.com