Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipkin Exporter #192

Merged
merged 4 commits into from
Sep 19, 2019
Merged

Conversation

hekike
Copy link
Member

@hekike hekike commented Aug 11, 2019

First take on Zipkin Exporter.
Please provide feedback.

Zipkin Spec:
https://github.com/openzipkin/zipkin-api/blob/master/zipkin2-api.yaml

TODO:

DISCUSS:

Copy link
Member

@mayurkale22 mayurkale22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start :)

packages/opentelemetry-exporter-zipkin/src/types.ts Outdated Show resolved Hide resolved
* Shutdown exporter. Noop operation in this exporter.
*/
shutdown() {
// TODO: should we initiate an oppurtinistic send(..)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, I think we should call send() one last time to send last batched data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss behavior on the next SIG meeting. As we don't have a callback here graceful shutdown is not really possible. I'd love the spec to be more specific here about this behavior: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/sdk-exporter.md#shutdown

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on the use case. What is the use case of shutdown?

packages/opentelemetry-exporter-zipkin/src/zipkin.ts Outdated Show resolved Hide resolved
[types.SpanKind.CONSUMER]: zipkinTypes.SpanKind.CONSUMER,
[types.SpanKind.PRODUCER]: zipkinTypes.SpanKind.PRODUCER,
// Zipkin doesnt have type INTERNAL
[types.SpanKind.INTERNAL]: zipkinTypes.SpanKind.CLIENT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure about this. Just add @todo here and will revisit again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should skip the span bcoz INTERNAL indicates that the span is used internally.

Copy link
Member Author

@hekike hekike Aug 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure?
SpanKind.Internal MUST be assumed as a default. - spec

This would mean that we would skip on every span by default, except someone defines SpanKind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I checked out Zipkin docs again and sounds like we can just leave it undefined and let JSON.stringify to remove it.

kind: When present, kind clarifies timestamp, duration and remoteEndpoint. When
                    absent, the span is local or incomplete. 

id: span.spanContext.spanId,
kind: ZIPKIN_SPAN_KIND_MAPPING[span.kind],
timestamp: span.startTime * MICROS_PER_MILLI,
duration: Math.round(span.startTime - span.endTime * MICROS_PER_MILLI),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(span.endTime - span.startTime) * MICROS_PER_MILLI?

@codecov-io
Copy link

codecov-io commented Aug 12, 2019

Codecov Report

Merging #192 into master will increase coverage by 0.09%.
The diff coverage is 99.67%.

@@            Coverage Diff             @@
##           master     #192      +/-   ##
==========================================
+ Coverage   98.86%   98.95%   +0.09%     
==========================================
  Files          69       76       +7     
  Lines        2732     3161     +429     
  Branches      193      209      +16     
==========================================
+ Hits         2701     3128     +427     
- Misses         31       33       +2
Impacted Files Coverage Δ
...ges/opentelemetry-exporter-zipkin/test/e2e.test.ts 100% <100%> (ø)
...entelemetry-exporter-zipkin/test/transform.test.ts 100% <100%> (ø)
...ackages/opentelemetry-exporter-zipkin/src/types.ts 100% <100%> (ø)
.../opentelemetry-exporter-zipkin/test/zipkin.test.ts 100% <100%> (ø)
...ges/opentelemetry-exporter-zipkin/src/transform.ts 100% <100%> (ø)
...ckages/opentelemetry-exporter-zipkin/src/zipkin.ts 98.21% <98.21%> (ø)
...pentelemetry-node-sdk/src/instrumentation/utils.ts 96% <0%> (-4%) ⬇️
...node-sdk/test/instrumentation/PluginLoader.test.ts 100% <0%> (ø) ⬆️
...pe-async-hooks/test/AsyncHooksScopeManager.test.ts 100% <0%> (ø) ⬆️
... and 6 more

id: span.spanContext.spanId,
kind: ZIPKIN_SPAN_KIND_MAPPING[span.kind],
timestamp: span.startTime * MICROS_PER_MILLI,
duration: (span.startTime - span.endTime) * MICROS_PER_MILLI,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think duration should be (span.endTime - span.startTime) and not (span.startTime - span.endTime).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should store the duration and not the end time. When I think about it, it doesn't even make sense to rely on the concept of an end time since it could end up before the start time because of daylight saving time. This makes the current implementation semantically incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding daylight savings time, my understanding is that performance.now is monotonic, so that wouldn't be an issue if we use it? Or am I missing something?

I'm open to storing the duration, but thought it could be simpler to just store start and end since that's what's provided either at measurement time or as a parameter

@mayurkale22
Copy link
Member

Test coverage

Please include the tests when you get time.

Batching

The Batching will most likely be handled by the SpanProcessor open-telemetry/opentelemetry-specification#205, I have PR(#149) open on same. Please add comments on the specs issue.

Event mapping

Handled in #195, should un-block you.

determinate span.shared

I think keep it false for now and add @todo and create an issue to address it later.

span.status mapping

Per SIG discussion, let's go with current mapping (ot.status_code), will revisit later. Moving forward, will share clear guidance on changes from OpenCensus to OpenTelemetry.

shutdown behavior

Per SIG discussion, make it configurable shutdown(forceFlush: boolean = true). If true, it will do (non-blocking) flush data and graceful shutdown otherwise direct exit. I got to know that callback is not required in this case, the SDK should do the best job to flush out the data - Log the warn/error if it is really required.

retry logic / HTTP client #190

Per SIG discussion, this is up-to exporter to implement the retry mechanism. Use raw right now and it’s sufficient.

@hekike
Copy link
Member Author

hekike commented Aug 17, 2019

@mayurkale22 I added tests and tested with Zipkin backend. Could also make some fields optional when not provided to minimize the payload. After integrating the batch logic, it's ready to go.

packages/opentelemetry-exporter-zipkin/src/transform.ts Outdated Show resolved Hide resolved
packages/opentelemetry-exporter-zipkin/src/transform.ts Outdated Show resolved Hide resolved
url?: string;
// Initiates a request with spans in memory to the backend.
forceFlush?: boolean;
// Optional mapping overrides for OpenTelemetry status code and description.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea :)

packages/opentelemetry-exporter-zipkin/src/zipkin.ts Outdated Show resolved Hide resolved
packages/opentelemetry-exporter-zipkin/package.json Outdated Show resolved Hide resolved
Copy link
Member

@mayurkale22 mayurkale22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added a minor comment. Thanks for the work!

@hekike
Copy link
Member Author

hekike commented Aug 18, 2019

@mayurkale22 thanks for the review, I changed it.
Also opened an issue open-telemetry/opentelemetry-specification#220 and PR open-telemetry/opentelemetry-specification#221 in the spec repo about Zipkin mapping.

/**
* The network context of a node in the service graph.
*/
export interface Endpoint {
Copy link

@jcchavezs jcchavezs Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zipkin format also supports ipv6 for endpoints. Any particular reason for not including it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. This is what I found on zipkin API.

description: The network context of a node in the service graph
serviceName stringLower-case label of this node in the service graph, such as "favstar". Leaveabsent if unknown.This is a primary label for trace lookup and aggregation, so it should beintuitive and consistent. Many use a name from service discovery.
ipv4 string($ipv4)The text representation of the primary IPv4 address associated with thisconnection. Ex. 192.168.99.100 Absent if unknown.
ipv6 string($ipv6)The text representation of the primary IPv6 address associated with aconnection. Ex. 2001:db8::c001 Absent if unknown.Prefer using the ipv4 field for mapped addresses.
port integerDepending on context, this could be a listen port or the client-side of asocket. Absent if unknown. Please don’t set to zero.

@hekike Could you please add support ipv6? Thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect ipv6 would be handled by node, do I miss something?

$ node -p -e 'require("url").parse("http://[FEDC:BA98:7654:3210:FEDC:BA98:7654:3210]:9411/api/v2/spans")'
Url {
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: '[fedc:ba98:7654:3210:fedc:ba98:7654:3210]:9411',
  port: '9411',
  hostname: 'fedc:ba98:7654:3210:fedc:ba98:7654:3210',
  hash: null,
  search: null,
  query: null,
  pathname: '/api/v2/spans',
  path: '/api/v2/spans',
  href:
   'http://[fedc:ba98:7654:3210:fedc:ba98:7654:3210]:9411/api/v2/spans' }

@hekike
Copy link
Member Author

hekike commented Sep 18, 2019

@mayurkale22 PR rebased and I think ready to go.

@mayurkale22
Copy link
Member

@mayurkale22 PR rebased and I think ready to go.

Great, I will take a look today and merge once everything looks good. Thanks for the work @hekike

@mayurkale22
Copy link
Member

@open-telemetry/node-approvers Please review when you get time, thanks :)

@mayurkale22
Copy link
Member

@hekike thanks for the great work! I will merge this PR now and run end-to-end test.

@mayurkale22 mayurkale22 merged commit b58965c into open-telemetry:master Sep 19, 2019
pichlermarc pushed a commit to dynatrace-oss-contrib/opentelemetry-js that referenced this pull request Dec 15, 2023
* chore: use addSpanNetworkEvents from otel-web

Rather than use a modified copy, adjust event creation code
to use shared routine from otel-web.  This has the pleasant
side effect of also adding http.response_content_length.

* fix: bring resource fetch spans into spec compliance

Names should not be raw URLs, so I arbitrarily chose 'resourceFetch'
to mirror 'documentFetch' - I'm open to any better suggestions.  Because
the resource url is of course very useful I've put it in the 'http.url'
attribute instead.

* chore: lint:fix and use semantic constants

* fix: firefox sometimes has a fetchStart of 0, doesn't emit doc load spans

I noticed that in unit and manual tests Firefox would frequently not
emit doc load events.  Turns out that sometimes the fetchStart was 0
and the logic prevented spans from being emitted.  Simply checking
>= 0 rather than > 0 fixes that.

* chore: use constant for resourceFetch span name

Co-authored-by: Bartlomiej Obecny <bobecny@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants