Skip to content
This repository has been archived by the owner on May 26, 2023. It is now read-only.

Opentracing? #12

Open
tonyghita opened this issue Oct 30, 2017 · 11 comments
Open

Opentracing? #12

tonyghita opened this issue Oct 30, 2017 · 11 comments

Comments

@tonyghita
Copy link

Would you be open to adopting the OpenTracing standard? It's a vendor-neutral open standard for distributed tracing (kind of like GraphQL is to APIs).

There is a wealth of distributed tracing tooling that also implements the standard. Adopting OpenTracing would make it easier to integrate with this other tooling.

@martijnwalraven
Copy link
Contributor

I'm pretty excited about integrating with OpenTracing, but it would be an addition rather than a replacement of this format I think.

The goal of Apollo Tracing is to include GraphQL-specific timing and schema information in-band, as part of a response. It isn't clear how OpenTracing could replace that. But I can imagine adding fields to the format to associate a request with a trace ID, or associate span IDs with individual resolver calls for example.

@loganvolkers
Copy link

I think there's a great opportunity for the Apollo team to suggest a standard for how to implement tracing with OpenTracing across GraphQL servers.

The Go implementation that may provide a good starting point: https://github.com/neelance/graphql-go/blob/master/trace/trace.go

The Go add these tags to a their spans (aka resolvers)

operationName: "GraphQL request"
tags:
 - graphql.query
 - graphql.operationName
 - graphql.variables
 - graphql.error

fields:
 - graphql.type
 - graphql.field
 - graphql.args.*
 - graphql.error

Here is what Apollo tracing defines as tags on their resolvers (aka span)

  "path": [
    "hero",
    "name"
  ],
  "parentType": "Droid",
  "fieldName": "name",
  "returnType": "String!",

I did some quick research and it seems like tracing in GraphQL is pretty fragmented:

@tonyghita
Copy link
Author

The Go implementation is what I had in mind as well. It's working really well for me in production.

@martijnwalraven
Copy link
Contributor

So what use cases are people looking at solving with OpenTracing integration? Is it mostly about being able to propagate context to downstream operations? How do you see yourself using that context?

For Engine, one of the feature ideas we've talked about is the ability to associate a spanId with a resolver call, so you could drill down to downstream operations when investigating performance issues or errors. Does that seem useful?

@yvann
Copy link

yvann commented Jan 17, 2018

It's exactly why I use tracing for, my GraphQL API is only one component of my global project and I'd like to be able to follow the trace from the very beginning to the end, across all the services/components.

@nfisher
Copy link

nfisher commented Feb 21, 2018

@martijnwalraven while I agree the output will be different I think the underlying data structures and instrumentation could be shared as @loganvolkers proposed.

I would speculate you can store everything in opentracing's data structures that is required for Apollo tracing JSON. Before the final response body is sent across the wire the JSON trace data could be assembled from the opentracing trace tree.

Naively ObjectType feels like the best place to instrument the trace capture but I'm fairly new to graphene. So there might be a better place to conduct instrumentation.

ResolveInfo or something else that's passed to the resolvers would benefit from having these additional fields as a minimum;

  • tracer - the tracer client so that custom annotations and additional spans could be added.
  • parent_trace - the span that is the parent which is calling the current resolver.

There's some other fields I think ResolveInfo could benefit having which would help query optimisation but I'll not conflate that in this issue.

Ideally what I would want from it is all performance metrics and annotations that are available. The aim would be to answer questions like;

  • how many round-trips are happening to my data store?
  • if I'm using memcached what are the cache-hits?
  • what's the latency associated with individual downstream requests?
  • if using multi-tiered microservices what service(s) are making the request slow? (this implies the root and parent trace id's are being forwarded correctly but is where distributed tracing really shines)

I've instrumented some tracing using Jaegar, custom middleware, and opentracing in a spike I've done with Star Wars data. It's not where I want it to be but this screenshot which uses a Star Wars schema illustrates the minimum I would want to see. Ideally I wouldn't need custom middleware to create these spans.

screen shot 2018-02-21 at 01 05 56

@pavelnikolov
Copy link

pavelnikolov commented May 10, 2018

@martijnwalraven
I believe that OpenTracing is extremely important in microservices architecture. CNCF has adopted both OpenTracing and Jaeger and many companies are switching to these. But it's vendor neutral and can be used with Zipkin or other tracing systems as well.

In my team, we have many services and we use a GraphQL server (the Go implementation mentioned by @tonyghita and @loganvolkers) in front of them as an API Gateway pattern. The GraphQL server is a very thin layer that contains (almost) no logic at all. Each resolver is fetching data from one or more other services using gRPC. In this setup, it is common to have GraphQL requests, which fetch data from multiple services, databases, Redis instances, Elasticsearch etc.
Imagine that you have a request that is taking >500ms and someone is complaining that it is too slow. Without OpenTracing it would be really hard to tell which resolver is slowing the entire request. Even if you identify the resolver, you still have to find the real problem. Adding OpenTracing makes root cause analysis a piece of cake. You can immediately identify the bottleneck, even if your requests spans across many services and involves hundreds or even thousands of spans.

We have also been able to identify poor performance where some sub-requests were executed sequentially instead of in parallel. Things like parallelism and concurrency are immediately visible on the trace graph.

And finally, tracing solutions like Jaeger help you visualise service dependencies in your system. If every single service in your system implements OpenTracing, you can easily plot a dependency graph and see which service depends on which other service. Or which service (if any) uses a particular resolver.

I truly believe that OpenTracing is the 2nd most important feature in a GraphQL server (after being able to serve GraphQL).

@cliedeman
Copy link

Hello,

I am also interested in this and have created a test project to see what is possible.

Note this does require some unreleased context extensions changes which have been merged to master but not yet released so I am patching apollo-server-core right now.

The results look promising so far
jaeger-ui

This is the query:

query PeopleQuery {
  people {
    name
  }
}

With a 1 second delay on resolving people and 200 millisecond delay for each name - Contrived I know but easy to test what I am after.

The traces might look a bit weird because the person resolvers (last 6 spans) are not nested inside the people span. So resolution of people appears to take 1 second when it actually takes 1.2.
(Making the person resolver spans children of the people resolver )

Resolution Sequence:

Resolving: people
Resolved: people
Resolving: people.[0].name
Resolving: people.[1].name
Resolving: people.[2].name
Resolved: people.[0].name
Resolved: people.[1].name
Resolved: people.[2].name

I plan to integrate tracing into knex next and see how that goes.

I also found this project

@nfisher
Copy link

nfisher commented Sep 10, 2018

I would also give https://github.com/census-instrumentation a look too. With the Go API it is at least a little more idiomatic and does a good job abstracting the underlying tracers away.

@viebel
Copy link

viebel commented Mar 23, 2020

Any updates on this issue?

@tvvignesh
Copy link

Filed an issue relating to opentelemetry here now that the project has been merged: apollographql/apollo-tooling#1889

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants