-
Notifications
You must be signed in to change notification settings - Fork 320
Experimental instrumentation for the OpenTelemetry Tracing API #1605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This can be used as an alternative to OpenTracing for custom instrumentation. The integration works by hijacking the `OpenTelemetry.getTracerProvider` and `OpenTelemetry.getPropagators` method return values and replacing it with our own. To enable: * System Property: `-Ddd.integration.opentelemetry-beta.enabled=true` * Environment Variable: `DD_INTEGRATION_OPENTELEMETRY_BETA_ENABLED=true` The actual mapping semantics are subject to change as the API matures. Events are completely ignored as there's no analogous feature in Datadog APM. This also won't impact the existing MeterProvider.
...emetry/src/main/java/datadog/trace/instrumentation/opentelemetry/OtelContextPropagators.java
Show resolved
Hide resolved
| @Override | ||
| public void setStatus(final Status status) { | ||
| if (!status.isOk()) { | ||
| delegate.setError(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this depend on the type of span? IE: a client sees 400 as an error where a server doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the regular http status codes... this is an OTel specific Status (more similar to grpc) https://github.com/open-telemetry/opentelemetry-java/blob/master/api/src/main/java/io/opentelemetry/trace/Status.java
|
|
||
| @Override | ||
| public Tracer get(final String instrumentationName, final String instrumentationVersion) { | ||
| return new OtelTracer(instrumentationName, AgentTracer.get(), converter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably cache this instead of returning a new one everytime to protect from bad instrumentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a nice very low overhead cache on #1603, maybe we can use that once that's merged, but merge this without implementing caching yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a todo.
.../src/main/java/datadog/trace/instrumentation/opentelemetry/OpenTelemetryInstrumentation.java
Show resolved
Hide resolved
...emetry/src/main/java/datadog/trace/instrumentation/opentelemetry/OtelContextPropagators.java
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public String get(final C carrier, final String key) { | ||
| return getter.get(carrier, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should be aiming to use this more since every HTTP API can do this quickly.
...ation/opentelemetry/src/main/java/datadog/trace/instrumentation/opentelemetry/OtelScope.java
Show resolved
Hide resolved
| @Override | ||
| public void end(final EndSpanOptions endOptions) { | ||
| if (endOptions.getEndTimestamp() > 0) { | ||
| delegate.finish(TimeUnit.NANOSECONDS.toMicros(endOptions.getEndTimestamp())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which timezone? Is there a guarantee it's UTC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure... I think it fully depends on what the user provides which we can't control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to compare timestamps taken relative to different clocks, so whoever computes durations needs to be in control of the clocks.
...n/opentelemetry/src/main/java/datadog/trace/instrumentation/opentelemetry/TypeConverter.java
Outdated
Show resolved
Hide resolved
...n/opentelemetry/src/main/java/datadog/trace/instrumentation/opentelemetry/TypeConverter.java
Outdated
Show resolved
Hide resolved
| import spock.lang.Subject | ||
|
|
||
| class OpenTelemetryTest extends AgentTestRunner { | ||
| static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this can leak to other tests, is it a time bomb? Do we need a much easier way of controlling environmental conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried clearing this and other similar tests out in cleanup, but it caused other test problems, likely because of a race condition with other tests. Not sure though. Either way, I don't want to deal with this problem here.
| def "test span"() { | ||
| setup: | ||
| def builder = tracer.spanBuilder("some name") | ||
| if (tagBuilder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about this much logic in tests. I haven't encountered it in other code bases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look better now?
This reverts commit a0517bc. # Conflicts: # dd-java-agent/instrumentation/grizzly-2/src/test/groovy/GrizzlyTest.groovy
|
I think I've addressed most of the feedback above. If you think I've missed something that you'd like to see in this PR, please chime in again. |
randomanderson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This closely mirrors the OpenTracing instrumentation so LGTM.
I'll leave the "should we" to the policy makers.
Do you think there should be a smoketest like the OpenTracing one?
|
I'm not opposed to the idea, but maybe when we are ready to remove the experimental/beta part and want to target specific versions... |
| import datadog.trace.context.TraceScope; | ||
| import io.opentelemetry.context.Scope; | ||
|
|
||
| public class OtelScope implements Scope, TraceScope { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect our Otel objects to implement our objects -- just the Otel objects.
I'm curious why we're implementing both interfaces.
| delegate.close(); | ||
| } | ||
|
|
||
| @Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised to see the async propagation methods on this class. I realize it follows from implementing datadog's TraceScope, but wonder why we need to do that.
| delegate.setTag(key, value.getBooleanValue()); | ||
| break; | ||
| default: | ||
| // Unsupported.... Ignoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the unsupported methods / cases should an option to raise UnsupportedOperationException.
Not raising an exception bit us in another situation recently.
| import java.util.Map; | ||
| import java.util.concurrent.TimeUnit; | ||
|
|
||
| public class OtelSpan implements Span, MutableSpan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with Scope, I wouldn't expect this to implement MutableSpan.
|
|
||
| @Override | ||
| public MutableSpan setOperationName(final String serviceName) { | ||
| return delegate.setOperationName(serviceName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong to me.
I would expect the setters to return "this" -- not unwrap the delegate.
| import io.opentelemetry.trace.TraceId; | ||
| import io.opentelemetry.trace.TraceState; | ||
|
|
||
| public class OtelSpanContext extends SpanContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't implement AgentSpan.Context? I generally don't think we should implement our interfaces here, but the inconsistency confuses me.
|
|
||
| OtelTracer( | ||
| final String tracerName, final AgentTracer.TracerAPI tracer, final TypeConverter converter) { | ||
| this.tracerName = tracerName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like if we could push the information down into our tracers, too, but that can be another PR.
| import io.opentelemetry.trace.SpanContext; | ||
|
|
||
| // Centralized place to do conversions | ||
| public class TypeConverter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this might be making things more complicated than they need to be.
I don't see much value in centralizing this logic given that the conversion to / from should be a 1-liner.
I'm also worried by the addition of more instanceof checks which we should avoid, but also just seem unnecessary in this case.
This can be used as an alternative to OpenTracing for custom instrumentation.
The integration works by hijacking the
OpenTelemetry.getTracerProviderandOpenTelemetry.getPropagatorsmethod return values and replacing it with our own.To enable:
-Ddd.integration.opentelemetry-beta.enabled=trueDD_INTEGRATION_OPENTELEMETRY_BETA_ENABLED=trueThe actual mapping semantics are subject to change as the API matures.
Events are completely ignored as there's no analogous feature in Datadog APM. This also won't impact the existing MeterProvider.