GraphQL Debugger Performance

# GraphQL Debugger Performance

This issue tracks the progress of reporting a potential performance issue with the [standard OpenTelemetry lib](https://github.com/open-telemetry/opentelemetry-js).

Related: 

* https://github.com/rocket-connect/graphql-debugger/issues/287
* https://github.com/rocket-connect/otel-js-server-benchmarks/issues/1
* https://github.com/open-telemetry/opentelemetry-js/issues/4741

## What is the issue?
Using GraphQL debugger introduces significant latency, primarily because it wraps a GraphQL resolver with logic that interacts with standard OpenTelemetry (OTEL) libraries.

We investigated the potential overhead caused by this resolver wrapping and identified several ways to improve performance on our end, including:

1. https://github.com/rocket-connect/graphql-debugger/pull/289
2. https://github.com/rocket-connect/graphql-debugger/pull/290
3. https://github.com/rocket-connect/graphql-debugger/pull/301
4. https://github.com/rocket-connect/graphql-debugger/pull/297
5. https://github.com/rocket-connect/graphql-debugger/pull/326

Despite these improvements, our [benchmarks](https://github.com/rocket-connect/graphql-debugger/tree/main/benchmarks) still show significant overhead when using standard OpenTelemetry, and even more so with our middleware.

## How do we see the performance? 
In the process of debugging performance and assessing the impact of our work, we created a few benchmarks to demonstrate our case. Initially, we forked  [graphql-crystal/benchmarks](https://github.com/graphql-crystal/benchmarks) to our own repository, [rocket-connect/benchmarks](https://github.com/rocket-connect/benchmarks), and began to modify it to only target the JS runtimes and GraphQL servers that came with it.

We saw an impact coming from OpenTelemetry when implementing the `yoga-otel` benchmark. By simply using the standard OTEL libraries 'raw' and creating a span inside a GraphQL resolver, we also observed the performance issue. Our investigation revealed that the performance issue was not specifically with GraphQL debugger, in how we are wrapping the resolvers and storing various attributes, but it was, in fact, an issue with the usage of the standard OTEL libraries.


The benchmark used the standard OpenTelemetry libraries within the resolver to create a span:

```ts
const resolvers = {
  Query: {
    hello: (root, args, context, info) => {
      const tracer = opentelemetry.trace.getTracer("example-tracer");
      const span = tracer.startSpan("say-hello");
      span.setAttribute("hello-to", "world");
      span.setAttribute("query", JSON.stringify(info.operation));
      span.addEvent("invoking resolvers");
      span.end();
      return "world";
    },
  },
};
``` 

Resulting in an increase in latency by up to 100%.

Given our findings, we first moved the benchmarks into the monorepo [rocket-connect/graphql-debugger/benchmarks](https://github.com/rocket-connect/graphql-debugger/tree/main/benchmarks), where they are invoked on each commit to the main branch. Additionally, we created an isolated repository, [rocket-connect/otel-js-server-benchmarks](https://github.com/rocket-connect/otel-js-server-benchmarks), to demonstrate the performance impact of using OTEL inside basic node http and express endpoints.

### Extracts 

#### Initial Finding
This extract comes from our initial fork [rocket-connect/benchmarks](https://github.com/rocket-connect/benchmarks), where we discovered that just using OTEL in isolation, without debugger, massively impacted the performance of yoga, taking latency from `15.33ms` to `35.39ms` and requests from `13kps` to `5.7kps`.

<img width="589" alt="Screenshot 2024-05-28 at 15 36 35" src="https://github.com/rocket-connect/graphql-debugger/assets/35999252/4769172a-365a-4f63-b87c-c250b2bb8256">



#### Move to monorepo

After our findings in our initial work, we moved the benchmarks to the graphql debugger monorepo [rocket-connect/graphql-debugger/benchmarks](https://github.com/rocket-connect/graphql-debugger/tree/main/benchmarks), where you can see a better view of all graphql js runtimes with and without OpenTelemetry. This also enabled us to iterate on the performance impact we did have, resulting in reducing the latency of `yoga-debugger` from `92.52ms` to `52.72ms` and increasing requests from `2.1kps` to `3.8kps`.

<img width="571" alt="Screenshot 2024-05-28 at 15 37 10" src="https://github.com/rocket-connect/graphql-debugger/assets/35999252/ae0151ef-907c-48de-a3ab-5690015ca9db">


#### Isolate OpenTelemetry benchmarks 
Finally, given that our initial work indicated the problem was isolated to using OTEL libraries and propagated from our middleware, we decided to move beyond GraphQL and demonstrate the same examples using standard Node HTTP versus Express [rocket-connect/otel-js-server-benchmarks](https://github.com/rocket-connect/otel-js-server-benchmarks). Our results show that adding just a few lines of OTEL code to your HTTP or Express handler will result in a significant reduction in the performance of your API. For example, a basic http endpoint operating at `6.26ms` latency more than triples the average time to `22.03ms` when OTEL is added, rendering it unusable for any production setting.


<img width="358" alt="Screenshot 2024-05-28 at 17 54 43" src="https://github.com/rocket-connect/graphql-debugger/assets/35999252/f667392d-36b9-48b1-a5cd-5ee277efb501">



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GraphQL Debugger Performance #322

GraphQL Debugger Performance

What is the issue?

How do we see the performance?

Extracts

Initial Finding

Move to monorepo

Isolate OpenTelemetry benchmarks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GraphQL Debugger Performance #322

Description

GraphQL Debugger Performance

What is the issue?

How do we see the performance?

Extracts

Initial Finding

Move to monorepo

Isolate OpenTelemetry benchmarks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions