Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document special Zipkin conversion cases #566

Merged
merged 6 commits into from
Apr 28, 2020

Conversation

pavolloffay
Copy link
Member

This proposal is mostly related to OTEL collector as I assume it is the only one collecting Zipkin data in thrift format. I am not sure if this could also happen in JSON or proto (probably not).

In Jaeger we have a set of span sanitizers https://github.com/jaegertracing/jaeger/blob/master/cmd/collector/app/sanitizer/zipkin/span_sanitizer.go that properly set some span fields if the values are missing. I think it might be worth documenting these as we saw that old Zipkin instrumentations/clients might send spans with missing fields (start time, duration).

The only missing sanitizer is for span errors. Errors in Zipkin spans might have the following format error=some error. This will have to also be transformed to OTEL format. It is blocked by #427.

Signed-off-by: Pavol Loffay ploffay@redhat.com

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should be reviewed by Zipkin maintainers, I would open a ticket there asking them to confirm the clean-up rules (because Zipkin server does something similar).

@bogdandrutu
Copy link
Member

@adriancole if you don't mind, can you please help us here?

@codefromthecrypt
Copy link

codefromthecrypt commented Apr 23, 2020

Conversion to v1 format is full of nuance. do you actually support thrift? If not, possibly less distracting to discuss nuance of v2. If you are discussing v1, I would put it in a very clear section that this is discussion of a deprecated format.

let me know which and we can continue.

@yurishkuro
Copy link
Member

I think this is about accepting v1 format. There's little reason for otel collector to be generating v1.

@codefromthecrypt
Copy link

ok then maybe put this under a section heading. I have some things I have to do but will review more tomorrow

Here's a spike for that heading:

Considerations for Legacy (v1) Format

Zipkin's v2 json format was defined in 2017, followed up by a protobuf format in 2018.

Frameworks made before then use a more complex v1 Thrift or json format that notably differs in so far as it uses terminology such as Binary Annotation, and repeats endpoint information on each attribute.

Below is the conversion process of v1 format into OpenTelemetry format.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that things like missing duration may not happen on some SDKs but not a bad idea to document it

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

Frameworks made before then use a more complex v1 [Thrift](https://github.com/openzipkin/zipkin-api/blob/master/thrift/zipkinCore.thrift) or [json](https://github.com/openzipkin/zipkin-api/blob/master/zipkin-api.yaml) format that notably differs in so far as it uses terminology such as Binary Annotation, and repeats endpoint information on each attribute.

Below is the conversion process of v1 format into OpenTelemetry format.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document will become very long, even timestamps is complicated. Also a v1 span can have multiple spans inside of it. is there any code in this org that does this? Meanwhile should you want to try, you can take notes and heuristics from here:

https://github.com/openzipkin/zipkin/blob/master/zipkin/src/main/java/zipkin2/v1/V1SpanConverter.java

personally, I'm quite behind (not just usual stuff but also covering for someone who's on paternity). I can't pick up more work load, and reviewing this until it is completely correct will be a big job. This time would be better spent on getting v2 conversion correct, or figuring out why people continue using v1 (probably envoy), and solving that!

It seems subtle, but even timestamp logic below isn't quite right, and I'd prefer not writing anything vs writing something not correct. In worst case, you can use words like "loosely" as while the heuristics listed below aren't wrong, they aren't completely correct.

You can possibly say this section is a work in progress, and you can look at the authoritative conversion logic I pasted for final answers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be a bad idea to simply refer to V1SpanConverter.java as a reference implementation. What's the point of translating code into prose so that someone can later translate it back into code? Zipkin server's code is always going to be the ultimate authority on how to translate from V1 to V2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the link to the V1SpanConverter.java.

Instead of writing the conversion procedure we can list the cases that require special attention. This should help people to understand what requires additional handling and for more information they can dig into Zipkin converter.

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

Consider using [V1SpanConverter.java](https://github.com/openzipkin/zipkin/blob/master/zipkin/src/main/java/zipkin2/v1/V1SpanConverter.java) as a reference implementation for converting v1 model to OpenTelemetry.

Follows a list of cases that require special handling:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no harm in listing these, of course, but not much benefit either imo - if people do look at V1SpanConverter code, those cases would be handled there already.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS even in v2 these cases can happen. For example, people reporting data in chunks.

The what's notably different is that in v1, these might be resolvable at ingest time using annotations. If we are going to say anything we should probably say more specific, or not say anything as it is misleading to say missing timestamp happens here (it implies it can't happen otherwise)

Ex.

Suggested change
Follows a list of cases that require special handling:
The span timestamp and duration were late additions to the V1 format. As in the code link above, it is possible to heuristically derive them from annotations.

It is things like this, that even doing this part will take some back and forth to get the words leading in the right direction, why I mention it is hard doc labor. What I pasted is ok if you remove the other text here. Just I would prefer you use this as one example as opposed to hinting that this doc will enumerate all things, such as the bullets below do.

I know intentions are good, just let's be careful not to dig a hole, basically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no harm in listing these, of course, but not much benefit either imo - if people do look at V1SpanConverter code, those cases would be handled there already.

From glancing at the java code it might not be obvious that timestamps/duration might be missing.


You are right there are probably several heuristics that can be applied during the conversion. For the OTEL scope we are probably interested in those that can happen at ingestion time.
I know only a small subset from what we do in Zipkin->Jaeger converter. I have applied your change. The goal here is to document that special cases exist and if it is convenient document which ones.

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Copy link

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this, @pavolloffay!

@yurishkuro yurishkuro merged commit 89efe15 into open-telemetry:master Apr 28, 2020
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this pull request Oct 31, 2024
* Document special Zipkin conversion cases

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Fix typos

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Fix lint

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Change heading and add explanation for special cases

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Reference zipkin v1 converter

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Fix review comments

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants