Opentracing? #12

tonyghita · 2017-10-30T16:58:50Z

Would you be open to adopting the OpenTracing standard? It's a vendor-neutral open standard for distributed tracing (kind of like GraphQL is to APIs).

There is a wealth of distributed tracing tooling that also implements the standard. Adopting OpenTracing would make it easier to integrate with this other tooling.

martijnwalraven · 2017-10-30T17:11:08Z

I'm pretty excited about integrating with OpenTracing, but it would be an addition rather than a replacement of this format I think.

The goal of Apollo Tracing is to include GraphQL-specific timing and schema information in-band, as part of a response. It isn't clear how OpenTracing could replace that. But I can imagine adding fields to the format to associate a request with a trace ID, or associate span IDs with individual resolver calls for example.

loganvolkers · 2017-11-10T19:59:07Z

I think there's a great opportunity for the Apollo team to suggest a standard for how to implement tracing with OpenTracing across GraphQL servers.

The Go implementation that may provide a good starting point: https://github.com/neelance/graphql-go/blob/master/trace/trace.go

The Go add these tags to a their spans (aka resolvers)

operationName: "GraphQL request"
tags:
 - graphql.query
 - graphql.operationName
 - graphql.variables
 - graphql.error

fields:
 - graphql.type
 - graphql.field
 - graphql.args.*
 - graphql.error

Here is what Apollo tracing defines as tags on their resolvers (aka span)

  "path": [
    "hero",
    "name"
  ],
  "parentType": "Droid",
  "fieldName": "name",
  "returnType": "String!",

I did some quick research and it seems like tracing in GraphQL is pretty fragmented:

Java: TBD OpenTracing Support? graphql-java/graphql-java#792
.NET: TBD Support Apollo Tracing graphql-dotnet/graphql-dotnet#381
Ruby: (supported via other packages, see Jaeger notes) [Discuss] GraphQL::Tracing rmosolgo/graphql-ruby#896

tonyghita · 2017-11-13T19:28:26Z

The Go implementation is what I had in mind as well. It's working really well for me in production.

martijnwalraven · 2017-11-13T19:36:36Z

So what use cases are people looking at solving with OpenTracing integration? Is it mostly about being able to propagate context to downstream operations? How do you see yourself using that context?

For Engine, one of the feature ideas we've talked about is the ability to associate a spanId with a resolver call, so you could drill down to downstream operations when investigating performance issues or errors. Does that seem useful?

yvann · 2018-01-17T02:45:42Z

It's exactly why I use tracing for, my GraphQL API is only one component of my global project and I'd like to be able to follow the trace from the very beginning to the end, across all the services/components.

nfisher · 2018-02-21T06:26:42Z

@martijnwalraven while I agree the output will be different I think the underlying data structures and instrumentation could be shared as @loganvolkers proposed.

I would speculate you can store everything in opentracing's data structures that is required for Apollo tracing JSON. Before the final response body is sent across the wire the JSON trace data could be assembled from the opentracing trace tree.

Naively ObjectType feels like the best place to instrument the trace capture but I'm fairly new to graphene. So there might be a better place to conduct instrumentation.

ResolveInfo or something else that's passed to the resolvers would benefit from having these additional fields as a minimum;

tracer - the tracer client so that custom annotations and additional spans could be added.
parent_trace - the span that is the parent which is calling the current resolver.

There's some other fields I think ResolveInfo could benefit having which would help query optimisation but I'll not conflate that in this issue.

Ideally what I would want from it is all performance metrics and annotations that are available. The aim would be to answer questions like;

how many round-trips are happening to my data store?
if I'm using memcached what are the cache-hits?
what's the latency associated with individual downstream requests?
if using multi-tiered microservices what service(s) are making the request slow? (this implies the root and parent trace id's are being forwarded correctly but is where distributed tracing really shines)

I've instrumented some tracing using Jaegar, custom middleware, and opentracing in a spike I've done with Star Wars data. It's not where I want it to be but this screenshot which uses a Star Wars schema illustrates the minimum I would want to see. Ideally I wouldn't need custom middleware to create these spans.

pavelnikolov · 2018-05-10T10:14:21Z

@martijnwalraven
I believe that OpenTracing is extremely important in microservices architecture. CNCF has adopted both OpenTracing and Jaeger and many companies are switching to these. But it's vendor neutral and can be used with Zipkin or other tracing systems as well.

In my team, we have many services and we use a GraphQL server (the Go implementation mentioned by @tonyghita and @loganvolkers) in front of them as an API Gateway pattern. The GraphQL server is a very thin layer that contains (almost) no logic at all. Each resolver is fetching data from one or more other services using gRPC. In this setup, it is common to have GraphQL requests, which fetch data from multiple services, databases, Redis instances, Elasticsearch etc.
Imagine that you have a request that is taking >500ms and someone is complaining that it is too slow. Without OpenTracing it would be really hard to tell which resolver is slowing the entire request. Even if you identify the resolver, you still have to find the real problem. Adding OpenTracing makes root cause analysis a piece of cake. You can immediately identify the bottleneck, even if your requests spans across many services and involves hundreds or even thousands of spans.

We have also been able to identify poor performance where some sub-requests were executed sequentially instead of in parallel. Things like parallelism and concurrency are immediately visible on the trace graph.

And finally, tracing solutions like Jaeger help you visualise service dependencies in your system. If every single service in your system implements OpenTracing, you can easily plot a dependency graph and see which service depends on which other service. Or which service (if any) uses a particular resolver.

I truly believe that OpenTracing is the 2nd most important feature in a GraphQL server (after being able to serve GraphQL).

cliedeman · 2018-09-10T12:45:12Z

Hello,

I am also interested in this and have created a test project to see what is possible.

Note this does require some unreleased context extensions changes which have been merged to master but not yet released so I am patching apollo-server-core right now.

The results look promising so far

This is the query:

query PeopleQuery {
  people {
    name
  }
}

With a 1 second delay on resolving people and 200 millisecond delay for each name - Contrived I know but easy to test what I am after.

The traces might look a bit weird because the person resolvers (last 6 spans) are not nested inside the people span. So resolution of people appears to take 1 second when it actually takes 1.2.
(Making the person resolver spans children of the people resolver )

Resolution Sequence:

Resolving: people
Resolved: people
Resolving: people.[0].name
Resolving: people.[1].name
Resolving: people.[2].name
Resolved: people.[0].name
Resolved: people.[1].name
Resolved: people.[2].name

I plan to integrate tracing into knex next and see how that goes.

I also found this project

nfisher · 2018-09-10T16:32:55Z

I would also give https://github.com/census-instrumentation a look too. With the Go API it is at least a little more idiomatic and does a good job abstracting the underlying tracers away.

viebel · 2020-03-23T09:45:37Z

Any updates on this issue?

tvvignesh · 2020-04-11T09:55:49Z

Filed an issue relating to opentelemetry here now that the project has been merged: apollographql/apollo-tooling#1889

kbrandwijk mentioned this issue Nov 10, 2017

Add tracing information prisma/prisma1#1209

Closed

ThomWright mentioned this issue Feb 27, 2018

extensions apollographql/apollo-server#657

Closed

tvvignesh mentioned this issue Apr 11, 2020

Integration/Plugin with Opentelemetry apollographql/apollo-tooling#1889

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opentracing? #12

Opentracing? #12

tonyghita commented Oct 30, 2017

martijnwalraven commented Oct 30, 2017

loganvolkers commented Nov 10, 2017

tonyghita commented Nov 13, 2017

martijnwalraven commented Nov 13, 2017

yvann commented Jan 17, 2018

nfisher commented Feb 21, 2018

pavelnikolov commented May 10, 2018 •

edited

Loading

cliedeman commented Sep 10, 2018

nfisher commented Sep 10, 2018

viebel commented Mar 23, 2020

tvvignesh commented Apr 11, 2020

Opentracing? #12

Opentracing? #12

Comments

tonyghita commented Oct 30, 2017

martijnwalraven commented Oct 30, 2017

loganvolkers commented Nov 10, 2017

tonyghita commented Nov 13, 2017

martijnwalraven commented Nov 13, 2017

yvann commented Jan 17, 2018

nfisher commented Feb 21, 2018

pavelnikolov commented May 10, 2018 • edited Loading

cliedeman commented Sep 10, 2018

nfisher commented Sep 10, 2018

viebel commented Mar 23, 2020

tvvignesh commented Apr 11, 2020

pavelnikolov commented May 10, 2018 •

edited

Loading