Skip to content

Conversation

@tylerbenson
Copy link
Contributor

This can be used as an alternative to OpenTracing for custom instrumentation.

The integration works by hijacking the OpenTelemetry.getTracerProvider and OpenTelemetry.getPropagators method return values and replacing it with our own.

To enable:

  • System Property: -Ddd.integration.opentelemetry-beta.enabled=true
  • Environment Variable: DD_INTEGRATION_OPENTELEMETRY_BETA_ENABLED=true

The actual mapping semantics are subject to change as the API matures.

Events are completely ignored as there's no analogous feature in Datadog APM. This also won't impact the existing MeterProvider.

This can be used as an alternative to OpenTracing for custom instrumentation.

The integration works by hijacking the `OpenTelemetry.getTracerProvider` and `OpenTelemetry.getPropagators` method return values and replacing it with our own.

To enable:
* System Property: `-Ddd.integration.opentelemetry-beta.enabled=true`
* Environment Variable: `DD_INTEGRATION_OPENTELEMETRY_BETA_ENABLED=true`

The actual mapping semantics are subject to change as the API matures.

Events are completely ignored as there's no analogous feature in Datadog APM. This also won't impact the existing MeterProvider.
@tylerbenson tylerbenson requested a review from a team as a code owner June 18, 2020 20:13
@Override
public void setStatus(final Status status) {
if (!status.isOk()) {
delegate.setError(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this depend on the type of span? IE: a client sees 400 as an error where a server doesn't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the regular http status codes... this is an OTel specific Status (more similar to grpc) https://github.com/open-telemetry/opentelemetry-java/blob/master/api/src/main/java/io/opentelemetry/trace/Status.java


@Override
public Tracer get(final String instrumentationName, final String instrumentationVersion) {
return new OtelTracer(instrumentationName, AgentTracer.get(), converter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably cache this instead of returning a new one everytime to protect from bad instrumentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a nice very low overhead cache on #1603, maybe we can use that once that's merged, but merge this without implementing caching yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a todo.


@Override
public String get(final C carrier, final String key) {
return getter.get(carrier, key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should be aiming to use this more since every HTTP API can do this quickly.

@Override
public void end(final EndSpanOptions endOptions) {
if (endOptions.getEndTimestamp() > 0) {
delegate.finish(TimeUnit.NANOSECONDS.toMicros(endOptions.getEndTimestamp()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which timezone? Is there a guarantee it's UTC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure... I think it fully depends on what the user provides which we can't control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to compare timestamps taken relative to different clocks, so whoever computes durations needs to be in control of the clocks.

import spock.lang.Subject

class OpenTelemetryTest extends AgentTestRunner {
static {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this can leak to other tests, is it a time bomb? Do we need a much easier way of controlling environmental conditions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried clearing this and other similar tests out in cleanup, but it caused other test problems, likely because of a race condition with other tests. Not sure though. Either way, I don't want to deal with this problem here.

def "test span"() {
setup:
def builder = tracer.spanBuilder("some name")
if (tagBuilder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about this much logic in tests. I haven't encountered it in other code bases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look better now?

This reverts commit a0517bc.

# Conflicts:
#	dd-java-agent/instrumentation/grizzly-2/src/test/groovy/GrizzlyTest.groovy
@tylerbenson
Copy link
Contributor Author

I think I've addressed most of the feedback above. If you think I've missed something that you'd like to see in this PR, please chime in again.

Copy link
Contributor

@randomanderson randomanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This closely mirrors the OpenTracing instrumentation so LGTM.

I'll leave the "should we" to the policy makers.

Do you think there should be a smoketest like the OpenTracing one?

@tylerbenson
Copy link
Contributor Author

I'm not opposed to the idea, but maybe when we are ready to remove the experimental/beta part and want to target specific versions...

@tylerbenson tylerbenson merged commit f713fe3 into master Jun 23, 2020
@tylerbenson tylerbenson deleted the tyler/otel branch June 23, 2020 21:41
@github-actions github-actions bot added this to the 0.56.0 milestone Jun 23, 2020
import datadog.trace.context.TraceScope;
import io.opentelemetry.context.Scope;

public class OtelScope implements Scope, TraceScope {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect our Otel objects to implement our objects -- just the Otel objects.
I'm curious why we're implementing both interfaces.

delegate.close();
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to see the async propagation methods on this class. I realize it follows from implementing datadog's TraceScope, but wonder why we need to do that.

delegate.setTag(key, value.getBooleanValue());
break;
default:
// Unsupported.... Ignoring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the unsupported methods / cases should an option to raise UnsupportedOperationException.
Not raising an exception bit us in another situation recently.

import java.util.Map;
import java.util.concurrent.TimeUnit;

public class OtelSpan implements Span, MutableSpan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with Scope, I wouldn't expect this to implement MutableSpan.


@Override
public MutableSpan setOperationName(final String serviceName) {
return delegate.setOperationName(serviceName);
Copy link
Contributor

@dougqh dougqh Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong to me.
I would expect the setters to return "this" -- not unwrap the delegate.

import io.opentelemetry.trace.TraceId;
import io.opentelemetry.trace.TraceState;

public class OtelSpanContext extends SpanContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't implement AgentSpan.Context? I generally don't think we should implement our interfaces here, but the inconsistency confuses me.


OtelTracer(
final String tracerName, final AgentTracer.TracerAPI tracer, final TypeConverter converter) {
this.tracerName = tracerName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like if we could push the information down into our tracers, too, but that can be another PR.

import io.opentelemetry.trace.SpanContext;

// Centralized place to do conversions
public class TypeConverter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this might be making things more complicated than they need to be.

I don't see much value in centralizing this logic given that the conversion to / from should be a 1-liner.
I'm also worried by the addition of more instanceof checks which we should avoid, but also just seem unnecessary in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants