Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC Interceptors #378

Closed
wants to merge 2 commits into from
Closed

Conversation

Jared-Prime
Copy link
Contributor

This PR provides an implementation for distributed tracing for Ruby gRPC services and clients.

screen shot 2018-03-20 at 4 54 19 pm

In gRPC nomenclature, an "interceptor" is similar to middleware. When running a service or a client, a programmer must declare intended interceptors for use during runtime; as a result, the Datadog #patch implementation merely supplies the constants GRPC::DatadogClientInterceptor and GRPC::DatadogServerInterceptor for use in interceptor registries. Future work may embed tracing in the interceptor call chain itself, making the instrumentation more transparent to the application developer.

The following types of gRPC calls have been implemented and tested:

  • synchronous request/reply
  • synchronous streaming requests
  • synchronous streaming replies

The following types of gRPC have not yet been implemented and/or tested:

  • synchronous bidirectional streams
  • asynchronous calls of any kind

I've relied upon Datadog's documentation on their platform as well as documentation of the Datadog Ruby gem, Datadog Python package, the Opentracing.io standard, and the existing Rack implementation in this project. Thanks a million to those contributors, as I couldn't have figured out how to do this without their work.

I plan to use my patch going forward and am grateful for any feedback that will help get this code ready for general usage.

@Jared-Prime Jared-Prime mentioned this pull request Mar 20, 2018
@palazzem palazzem requested a review from delner March 21, 2018 08:44
@palazzem palazzem added integrations Involves tracing integrations community Was opened by a community member labels Mar 21, 2018
@palazzem palazzem added this to the 0.12.0 milestone Mar 21, 2018
@Jared-Prime
Copy link
Contributor Author

Looks like there's a Rubocop complaint.

Running RuboCop...
Inspecting 190 files
..............................................................................................................................................................................W...............

Offenses:

test/propagation/grpc_propagator_test.rb:30:5: W: Lint/NestedMethodDefinition: Method definitions must not be nested. Use lambda instead.
    def test_extract ...
    ^^^^^^^^^^^^^^^^

190 files inspected, 1 offense detected
RuboCop failed!

I'll submit a commit to fix that sometime today.


A couple question for the reviewer(s):

  1. Do you prefer squashing the commits for the PR or having the full history?
  2. I have left the bidirectional support untested in this PR. Do you prefer I complete testing / verification of that message flow on my end, or is it OK to submit a follow-up once ready?

Thanks,

  • Jared

@delner delner added the feature Involves a product feature label Mar 21, 2018
@delner
Copy link
Contributor

delner commented Mar 21, 2018

Hey @Jared-Prime thanks for submitting this. Pretty excited about this one!

In answer to your questions:

  1. We're not too particular about squashing or full history, as long as the history is meaningful.
  2. My first thought, in the interest of thoroughness, would be "yes". Although, I'm not yet very familiar with the gRPC world, and what you mean by bidirectional support, and how important this is to test. Can you elaborate in more detail what this entails, how prominent/important this particular implementation is to gRPC, and maybe any risk factors you anticipate? Would the bidirectional testing already overlap with what's been tested so far? (If so, how much?)

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general feedback, but overall off to a great start here.

I might need to do additional reviews later given there's a lot of rich contextual detail to be absorbed/understood. Hopefully this gets things rolling, though.

service: Datadog.configuration[:grpc][:service_name],
span_type: 'grpc'
}
span = ddtracer.trace(name, tracer_options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about name: looks like this value would be server or client. Is there a more meaningful name that can be given? That might add a little context to what it's doing?

Secondary consideration; in many of the ddtrace integrations, we often prefix name with the name of the integration. Perhaps grpc.X might be helpful?

private

def ddtracer
@ddtracer ||= @options.fetch(:ddtracer, Datadog.tracer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a tracer option on the gRPC configuration. Should we replace Datadog.tracer with Datadog.configuration[:grpc][:tracer]?

include Ext::DistributedTracing

def self.inject!(context, metadata)
metadata[HTTP_HEADER_TRACE_ID] = context.trace_id.to_s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the idea of being consistent with the other context propagation headers, but would it make sense to create a new set of constants for gRPC? e.g. GRPC_METADATA_TRACE_ID/GRPC_TRACE_ID? Even if the values are roughly the same or equivalent?

Might be a good idea to avoid coupling with HTTP implementation in case there's some divergence in implementation later.

span_type: 'grpc'
}
span = ddtracer.trace(name, tracer_options)
Datadog::GRPCPropagator.inject!(span.context, metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a silly question, but in regards to injecting the context info into the metadata...

If I'm understanding the code correctly, GRPC is supposed to provide a metadata Hash via request_response(request: nil, call: nil, method: nil, metadata: nil) options correct? Is there a case where it doesn't actually provide this metadata Hash?

I ask here because above, metadata ||= {} will initialize the nil value to a Hash, but the inject! would end up modifying a local Hash variable that'd end up discarded anyways wouldn't it?

Copy link
Contributor Author

@Jared-Prime Jared-Prime Mar 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, and I'm not sure about the answer. I'll look it up. My suspicion is that the Hash is always supplied; I'm matching a method signature that I've seen elsewhere. I'll verify what's really needed here. Oh, I get it. Will improve

span = ddtracer.trace(name, tracer_options)
Datadog::GRPCPropagator.inject!(span.context, metadata)

yield
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since an interceptor is like middleware, does this yield equivocate to the actual gRPC call? In which case, could we wrap the trace method around it like:

ddtracer.trace(name, tracer_options) do |span|
  yield
end

The benefit you'd get here is automatic span.set_error and span.finish functions rendering the rescue and ensure blocks as moot.

private

def ddtracer
@ddtracer ||= @options.fetch(:ddtracer, Datadog.tracer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that the same general feedback from DatadogClientInterceptor likely applies here, too.

@@ -0,0 +1,64 @@
require 'grpc'

def run_service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a major thing, but would it be more appropriate to put this method into a module? The module could then be composed into the RSpec examples using include GRPCHelpers, while having the benefit of not having a global level run_service method.

require 'ddtrace/tracer'
require 'ddtrace/propagation/grpc_propagator'

class GRPCPropagatorTest < Minitest::Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been working on moving away from Minitest towards RSpec. Does this test benefit from something in Minitest that isn't available or conducive to RSpec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major benefit of Minitest over RSpec. I will rewrite this in RSpec instead

@Jared-Prime
Copy link
Contributor Author

Thanks @delner. I've refactored a bit according to your recommendations. I also found that my test for bidirectional (bidi) streaming was simply broken; I've fixed it and able to confirm the code should work for bidi streams as well.

No rush on reviewing; I will be on vacation the rest of this week and next. I'll look forward to tying up any loose ends when I'm back.

@palazzem palazzem removed this from the 0.12.0 milestone Mar 28, 2018
@delner
Copy link
Contributor

delner commented Mar 28, 2018

General thought when looking at this implementation: would it be conceivable to use GRPC to connect to different services? And if that's the case, would it be reasonable to want to trace those GRPC calls as different services?

If so, we might want to change how we tag traces with service_name. I think configuration[:grpc][:service_name] is a good start, but would inevitably pool multiple services as one; probably not something most users would want.

To this end we have the Datadog::Pin object, which would allow us to tag an individual object with configuration. Typically, we might "pin" service configuration onto each "client" object with Datadog.configure(client, service_name: 'client_1'). Then for the trace call, we'd retrieve the configuration from the client via pin = Datadog::Pin.get_from(client) and use that to configure the trace.

Does GRPC have a "client" or something similar we could pin configuration on? If not, maybe we should think about some strategy to support GRPC calls as different services.

@delner
Copy link
Contributor

delner commented Mar 28, 2018

Also:

@delner
Copy link
Contributor

delner commented Mar 28, 2018

Just to give you a sense of what the plan is for this one, I think we'd ultimately like to target a 0.13.0 release for this PR. This is a really awesome, but sophisticated feature that warrants some additional performance testing among other due diligence measures. Our 0.12.0 release will be rolling soon, and instead of rushing this in, it'd be good to give this one the time and attention it deserves.

We'll be opening a 0.13-dev branch soon, and I think it could be a good candidate for an early beta version of 0.13.0. Next steps between now and then would be to 1) address any outstanding feedback/issues in this PR and 2) develop a performance test plan (and perhaps a sample app) which we can use to verify that everything is in good shape, then finally 3) merge this PR and ship for a beta release.

@Jared-Prime
Copy link
Contributor Author

I think we'd ultimately like to target a 0.13.0 release for this PR. This is a really awesome, but sophisticated feature that warrants some additional performance testing among other due diligence measures.

I agree with that choice of release target and your proposed steps towards shipping 👍 . I'll be able to give this branch more love this week.

Jared Davis added 2 commits April 2, 2018 10:43
separate server and client interceptors into different files

create a serparate propagator for grpc usage

test for bidirectional streaming support

earlier inability to test this was due to a typo
in the TestService class definition

fix bug with GRPCPropagator

A miswritten test masked a bug. The GRPCPropagator potentially supplied
a value of 0 for a sampling priority
when no sampling priority was declared

[REFACTOR] implement PR review suggestions

- different constants for gRPC metadata
- supply tracer object via configuration
- supply run_service helper method as part of a module
- prefix trace name with "grpc"
- use rspec rather than minitest

some additional changes include
- supply the protocol method name as the datadog resource
- supply the grpc peer information as datadog tags
- set metadata information as tags

fix some testing errors

use circleci tests, remove old minitest test
@delner delner added this to the 0.13.0 milestone Apr 5, 2018
@delner delner changed the base branch from master to 0.13-dev April 13, 2018 17:32
@Jared-Prime Jared-Prime mentioned this pull request Apr 13, 2018
@Jared-Prime
Copy link
Contributor Author

closing this in favor of improved implementation at #403

@Jared-Prime Jared-Prime deleted the grpc-interceptors branch April 13, 2018 19:17
@delner delner removed this from the 0.13.0 milestone May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Was opened by a community member feature Involves a product feature integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants