Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace context propagation using non-gRPC headers #6144

Open
nicktrav opened this issue Sep 11, 2019 · 9 comments
Open

Trace context propagation using non-gRPC headers #6144

nicktrav opened this issue Sep 11, 2019 · 9 comments
Milestone

Comments

@nicktrav
Copy link
Contributor

Currently, gRPC uses the grpc-trace-bin header for context propagation across process boundaries. This works perfectly for an environment in which all services are using gRPC for service-to-service communication. In the case of a more "polyglot transport protocol" environment, preserving the trace context becomes a little more involved.

To give some more specific context / motivation for the problem at hand, we have inbound HTTP requests that are subsequently proxied by an "API gateway"-like service over gRPC to respective backends. All other downstream traffic is gRPC. We're running this in a "mesh" setup, with Envoy running as a proxy alongside each container.

Envoy is configured (via Isito) to look for B3 headers, and will correctly do the context propagation, for HTTP. It doesn't know about grpc-trace-bin and thus can't participate in these traces. Instead it will emit its own B3-flavored trace for the particular hop, and we end up with incomplete traces - one set for the client / server gRPC spans, and another for just the proxy hops.

I'd like to propose / seek feedback on the idea of making the wire transport for tracing configurable / pluggable. I'm coming at this from the perspective that gRPC's trace propagation is tightly coupled to a gRPC-specfic format and it doesn't really make sense (at least to me) to teach an HTTP server / proxy how to handle gRPC's internal format.

Maybe I'm missing some background. I did a quick look in the issues for this repo but didn't see anything obvious. Looks like something similar has been brought up in the Go community, via census-instrumentation/opencensus-go#666, and census-instrumentation/opencensus-specs#136 (closed out).

If this is better for a list, happy to post there too, jlmk where to ask.

cc: @adriancole

More detail / proof of concept for a Java service:

Basic idea of the change - Open Census has the concept of Binary and TextFormat "setters" and "getters" to do the propagation. We replaced the grpc-trace-bin propagation header with the full set of B3 headers (no reason this couldn't be the single B3 header).

I've proven this out internally with some light forking of the code to make a new tracing module that does the B3 propagation. The following code is an example, isn't production ready, etc. etc.

diff --git a/core/src/main/java/io/grpc/internal/CensusTracingModule.java b/core/src/main/java/io/grpc/internal/CensusTracingModule.java
index b30c02d6a..f0a4dacdd 100644
--- a/core/src/main/java/io/grpc/internal/CensusTracingModule.java
+++ b/core/src/main/java/io/grpc/internal/CensusTracingModule.java
@@ -28,6 +28,7 @@ import io.grpc.Context;
 import io.grpc.ForwardingClientCall.SimpleForwardingClientCall;
 import io.grpc.ForwardingClientCallListener.SimpleForwardingClientCallListener;
 import io.grpc.Metadata;
+import io.grpc.Metadata.Key;
 import io.grpc.MethodDescriptor;
 import io.grpc.ServerStreamTracer;
 import io.grpc.StreamTracer;
@@ -39,7 +40,12 @@ import io.opencensus.trace.Span;
 import io.opencensus.trace.SpanContext;
 import io.opencensus.trace.Status;
 import io.opencensus.trace.Tracer;
+import io.opencensus.trace.Tracing;
 import io.opencensus.trace.propagation.BinaryFormat;
+import io.opencensus.trace.propagation.SpanContextParseException;
+import io.opencensus.trace.propagation.TextFormat;
+import io.opencensus.trace.propagation.TextFormat.Getter;
+import io.opencensus.trace.propagation.TextFormat.Setter;
 import io.opencensus.trace.unsafe.ContextUtils;
 import java.util.concurrent.atomic.AtomicIntegerFieldUpdater;
 import java.util.logging.Level;
@@ -59,6 +65,9 @@ import javax.annotation.Nullable;
  */
 final class CensusTracingModule {
   private static final Logger logger = Logger.getLogger(CensusTracingModule.class.getName());
+  private static final TextFormat B3_FORMAT = Tracing.getPropagationComponent().getB3Format();
+  private static final Getter<Metadata> METADATA_GETTER = new B3Getter();
+  private static final Setter<Metadata> METADATA_SETTER = new B3Setter();
 
   @Nullable private static final AtomicIntegerFieldUpdater<ClientCallTracer> callEndedUpdater;
 
@@ -87,8 +96,6 @@ final class CensusTracingModule {
   }
 
   private final Tracer censusTracer;
-  @VisibleForTesting
-  final Metadata.Key<SpanContext> tracingHeader;
   private final TracingClientInterceptor clientInterceptor = new TracingClientInterceptor();
   private final ServerTracerFactory serverTracerFactory = new ServerTracerFactory();
 
@@ -96,23 +103,6 @@ final class CensusTracingModule {
       Tracer censusTracer, final BinaryFormat censusPropagationBinaryFormat) {
     this.censusTracer = checkNotNull(censusTracer, "censusTracer");
     checkNotNull(censusPropagationBinaryFormat, "censusPropagationBinaryFormat");
-    this.tracingHeader =
-        Metadata.Key.of("grpc-trace-bin", new Metadata.BinaryMarshaller<SpanContext>() {
-            @Override
-            public byte[] toBytes(SpanContext context) {
-              return censusPropagationBinaryFormat.toByteArray(context);
-            }
-
-            @Override
-            public SpanContext parseBytes(byte[] serialized) {
-              try {
-                return censusPropagationBinaryFormat.fromByteArray(serialized);
-              } catch (Exception e) {
-                logger.log(Level.FINE, "Failed to parse tracing header", e);
-                return SpanContext.INVALID;
-              }
-            }
-          });
   }
 
   /**
@@ -245,8 +235,7 @@ final class CensusTracingModule {
     public ClientStreamTracer newClientStreamTracer(
         ClientStreamTracer.StreamInfo info, Metadata headers) {
       if (span != BlankSpan.INSTANCE) {
-        headers.discardAll(tracingHeader);
-        headers.put(tracingHeader, span.getContext());
+        B3_FORMAT.inject(span.getContext(), headers, METADATA_SETTER);
       }
       return new ClientTracer(span);
     }
@@ -365,7 +354,13 @@ final class CensusTracingModule {
     @SuppressWarnings("ReferenceEquality")
     @Override
     public ServerStreamTracer newServerStreamTracer(String fullMethodName, Metadata headers) {
-      SpanContext remoteSpan = headers.get(tracingHeader);
+      SpanContext remoteSpan = null;
+      try {
+        remoteSpan = B3_FORMAT.extract(headers, METADATA_GETTER);
+      } catch (SpanContextParseException e) {
+        // FIXME: handle this
+      }
+
       if (remoteSpan == SpanContext.INVALID) {
         remoteSpan = null;
       }
@@ -419,4 +414,20 @@ final class CensusTracingModule {
     return prefix + "." + fullMethodName.replace('/', '.');
   }
 
+  private static class B3Getter extends Getter<Metadata> {
+
+    @Nullable
+    @Override
+    public String get(Metadata carrier, String key) {
+      return carrier.get(Key.of(key, Metadata.ASCII_STRING_MARSHALLER));
+    }
+  }
+
+  private static class B3Setter extends Setter<Metadata> {
+
+    @Override
+    public void put(Metadata carrier, String key, String value) {
+      carrier.put(Key.of(key, Metadata.ASCII_STRING_MARSHALLER), value);
+    }
+  }
 }

We're not actually forking gRPC for this, rather we're making a new tracing module that has the above changes, which we then install into client / server pipelines:

// client
CensusB3TracingModule module = new CensusB3TracingModule(Tracing.getTracer());
Channel wrappedChannel = ClientInterceptors.intercept(channel, module.getClientInterceptor());

// server
CensusB3TracingModule censusB3TracingModule = new CensusB3TracingModule(Tracing.getTracer());
NettyServerBuilder.forPort(50051)
    .addStreamTracerFactory(censusB3TracingModule.getServerTracerFactory())
    .addService(...);

We then have to disable the out of the box-gRPC-tracing so that we're only using B3 for propagation:

// client
InternalNettyChannelBuilder.setTracingEnabled(channelBuilder, false);

// server
InternalNettyServerBuilder.setTracingEnabled(serverBuilder, false);

With these code changes, we get full end-to-end tracing with all clients, servers and sidecar Envoys adding their spans.

@codefromthecrypt
Copy link
Contributor

FYI in brave, we "check twice" inbound preferring grpc-trace-bin https://github.com/openzipkin/brave/blob/master/instrumentation/grpc/src/main/java/brave/grpc/GrpcPropagation.java#L125

Outbound we dual propagate by default as it isn't clear what the otherside will use for reasons you mention. In the future someone could gain special knowledge via backtrace or such, but anyway now we dual-propagate.

@nicktrav
Copy link
Contributor Author

cc: @zhangkun83 @ejona86 for thoughts / help triaging

@ejona86
Copy link
Member

ejona86 commented Sep 17, 2019

I'd like to propose / seek feedback on the idea of making the wire transport for tracing configurable / pluggable.

As you already pointed out, it is possible to do what you need with our existing API. But I guess what you're saying here is you want to be able to use a pluggable wire transport with the Census integration.

Dual propagation seems quite reasonable given the current state of things. Ideally there would just be one format, but b3 doesn't seem like a great one. It is simple, but I'm confident using 5 headers would bother some performance people and be noticeable in benchmarks.

Forking CensusTracingModule doesn't bother me too much. Short term, we could make some changes to it easier to fork. Maybe even including just two hard-coded booleans on whether to use text and/or binary format.

We've been looking at removing the Census dependency from core for quite a while (#5076 is tracking that). But I'm not sure that helps.

In this particular case, I wouldn't be wild about making the wire transport pluggable. Instead I'd prefer to just support both formats directly. That's less API and both formats are supported by opencensus directly. I'm not sure exactly how we would want to expose this API though...

@nicktrav
Copy link
Contributor Author

Thanks for the reply, @ejona86. Some thoughts:

But I guess what you're saying here is you want to be able to use a pluggable wire transport with the Census integration.

Yeah, this is accurate. More specifically, Open Census - but we're splitting hairs on the naming here (my understanding is that Open Census was born out of gRPC).

Dual propagation seems quite reasonable given the current state of things. Ideally there would just be one format, but b3 doesn't seem like a great one.

I'm not sure I agree with you on having dual propagation. Here's where it's breaking down in our deployment currently:

We have intermediary proxies (specifically Envoy, configured via Istio) that only know how to handle specific propagation formats (currently, this is limited to B3). In the case where a gRPC client sets both grpc-trace-bin, and B3 headers, Envoy will emit tracing information based on the B3 headers and update parent spans by mutating headers before proxying the request. When the request hits the gRPC server, if the server picks the grpc-trace-bin header to infer the parent span, the server will use the incorrect parent and traces start to run in "parallel", rather than being nested.

A preferable solution in our case would just be to allow operators to pick what their trace propagation format looks like.

Instead I'd prefer to just support both formats directly.

To this extent, I think that supporting more than one format would be great. Maybe the API for this is even such that if you want to propagate both, you can do that via your own implementation of some abstract class, interface, etc.

Therefore, to firm up my proposal some more: allow an operator to configure their gRPC clients and servers with whatever propagation format they'd like. By default, grpc-trace-bin is the format out of the box. Maybe there's also a concrete B3 "propagator" implementation that lives in the repo.

Open to thoughts on what the API should look like, especially if there's work being done to re-work the existing tracing infrastructure.

As a footnote:

It is simple, but I'm confident using 5 headers would bother some performance people and be noticeable in benchmarks.

I don't think it necessarily has to use 5 headers, based on what I've read here. This could probably appease the performance nuts to some extent (though it's still a text encoding).

Reading this lead me to discovering that theres's a W3 candidate for "Trace Context" that specifies a common format to use. The API above should be flexible enough to allow for a "propagator" to be defined that could adhere to this spec too, if one so wished.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 17, 2019 via email

@ejona86
Copy link
Member

ejona86 commented Sep 18, 2019

The single-header version looks fair. I don't expect binary vs text would be a problem.

Having three is worrying, but they are still all explicitly supported in the PropagationComponent API. I really want census to define as much as this. Maybe we let someone pass a BinaryFormat or (at their choice) a TextFormat. Such an API would definitely need to be in its own artifact (not in grpc-api/grpc-core). (Looking at BinaryFormat more, it would also require someone to specify the Key; I'm not sure if that is too useful.)

For the record, we can't depend on io.opencensus.trace.propagation.TextFormat at the moment because it is unstable API. We'd need to work together to get that stabilized.

@bogdandrutu
Copy link
Contributor

@ejona86 in terms of "standard" headers I think you should look into using w3c standard headers that defined for this https://www.w3.org/TR/trace-context

@nicktrav I suggest you use Envoy with opencensus integration. If you look at the config for opencensus we do have something that will help you - we allow to set the incoming and outgoing formats, so you can set b3 as incoming and grpc-bin as outgoing, see:
https://github.com/envoyproxy/envoy/blob/master/api/envoy/config/trace/v2/trace.proto#L136

@nicktrav
Copy link
Contributor Author

we allow to set the incoming and outgoing formats, so you can set b3 as incoming and grpc-bin as outgoing

Thanks @bogdandrutu - this would indeed be useful. However, we're using Istio for the configuration of these Envoy's, so that would need to be exposed via Istio. Furthermore, for it to be useful for us, we'd need some way of being able to granularly define the mappings per service based on what each service speaks. It's not clear if that's possible.

It would be simpler for us to just adopt a single, standard tracing format. If we adopted the w3c headers, this issue is still relevant as gRPC would need a way to be able to configure that to be the trace propagation format.

@codefromthecrypt
Copy link
Contributor

@nicktrav in Brave, we've run across the need to handle things more granularly. For example, instead of working on the "Metadata" object, you can work on a "Request" object. This is tricky in google's gRPC library because there is no effective request type.. data you need to perform selections on what to propagate is spread out over many types in different scopes. We have a related issue to try to cobble them together so that you can, for example, change the propagation policy on a per-request basis, and without affecting user code openzipkin/brave#993

@zhangkun83 zhangkun83 added this to the Unscheduled milestone Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants