Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphQL Debugger Performance #322

Open
danstarns opened this issue May 28, 2024 · 0 comments
Open

GraphQL Debugger Performance #322

danstarns opened this issue May 28, 2024 · 0 comments

Comments

@danstarns
Copy link
Member

danstarns commented May 28, 2024

GraphQL Debugger Performance

This issue tracks the progress of reporting a potential performance issue with the standard OpenTelemetry lib.

Related:

What is the issue?

Using GraphQL debugger introduces significant latency, primarily because it wraps a GraphQL resolver with logic that interacts with standard OpenTelemetry (OTEL) libraries.

We investigated the potential overhead caused by this resolver wrapping and identified several ways to improve performance on our end, including:

  1. refactor: remove graphql variables, result, context attributes #289
  2. fix: remove legacy span creation #290
  3. fix: precompute schemahash #301
  4. fix: json the operation vs print #297
  5. feat: change to batch processor #326

Despite these improvements, our benchmarks still show significant overhead when using standard OpenTelemetry, and even more so with our middleware.

How do we see the performance?

In the process of debugging performance and assessing the impact of our work, we created a few benchmarks to demonstrate our case. Initially, we forked graphql-crystal/benchmarks to our own repository, rocket-connect/benchmarks, and began to modify it to only target the JS runtimes and GraphQL servers that came with it.

We saw an impact coming from OpenTelemetry when implementing the yoga-otel benchmark. By simply using the standard OTEL libraries 'raw' and creating a span inside a GraphQL resolver, we also observed the performance issue. Our investigation revealed that the performance issue was not specifically with GraphQL debugger, in how we are wrapping the resolvers and storing various attributes, but it was, in fact, an issue with the usage of the standard OTEL libraries.

The benchmark used the standard OpenTelemetry libraries within the resolver to create a span:

const resolvers = {
  Query: {
    hello: (root, args, context, info) => {
      const tracer = opentelemetry.trace.getTracer("example-tracer");
      const span = tracer.startSpan("say-hello");
      span.setAttribute("hello-to", "world");
      span.setAttribute("query", JSON.stringify(info.operation));
      span.addEvent("invoking resolvers");
      span.end();
      return "world";
    },
  },
};

Resulting in an increase in latency by up to 100%.

Given our findings, we first moved the benchmarks into the monorepo rocket-connect/graphql-debugger/benchmarks, where they are invoked on each commit to the main branch. Additionally, we created an isolated repository, rocket-connect/otel-js-server-benchmarks, to demonstrate the performance impact of using OTEL inside basic node http and express endpoints.

Extracts

Initial Finding

This extract comes from our initial fork rocket-connect/benchmarks, where we discovered that just using OTEL in isolation, without debugger, massively impacted the performance of yoga, taking latency from 15.33ms to 35.39ms and requests from 13kps to 5.7kps.

Screenshot 2024-05-28 at 15 36 35

Move to monorepo

After our findings in our initial work, we moved the benchmarks to the graphql debugger monorepo rocket-connect/graphql-debugger/benchmarks, where you can see a better view of all graphql js runtimes with and without OpenTelemetry. This also enabled us to iterate on the performance impact we did have, resulting in reducing the latency of yoga-debugger from 92.52ms to 52.72ms and increasing requests from 2.1kps to 3.8kps.

Screenshot 2024-05-28 at 15 37 10

Isolate OpenTelemetry benchmarks

Finally, given that our initial work indicated the problem was isolated to using OTEL libraries and propagated from our middleware, we decided to move beyond GraphQL and demonstrate the same examples using standard Node HTTP versus Express rocket-connect/otel-js-server-benchmarks. Our results show that adding just a few lines of OTEL code to your HTTP or Express handler will result in a significant reduction in the performance of your API. For example, a basic http endpoint operating at 6.26ms latency more than triples the average time to 22.03ms when OTEL is added, rendering it unusable for any production setting.

Screenshot 2024-05-28 at 17 54 43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant