Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: high latency when using otel libs inside common endpoints #4741

Closed
danstarns opened this issue May 29, 2024 · 5 comments
Closed
Labels
bug Something isn't working triage

Comments

@danstarns
Copy link
Contributor

danstarns commented May 29, 2024

What happened?

I am writing this issue to express the performance issues I've encountered using your libraries within common Node.js APIs such as HTTP, Express, and GraphQL. I have some links, benchmarks and examples to share with you below.

Related:

  1. GraphQL Debugger Performance rocket-connect/graphql-debugger#322
  2. Performance  rocket-connect/otel-js-server-benchmarks#1
  3. Performance slowdown using http-instrumentation #4372
  4. Add benchmark tests #4171

Background

We are building an observability platform leveraging OTEL JS on the users’ client side. It was reported to us that using our libraries with their APIs resulted in massive latency issues. We did some investigation broken down in full detail here:

Our implementation essentially wraps a GraphQL resolver, picking up on values in the tree info argument, capturing attributes, and starting a span. As described in our adjacent issue, we took several necessary steps to ensure it wasn’t an issue with our implementation. Additionally, we created a few benchmarks with and without GraphQL usage to compare performance impacts.

Hypothesis

We see that the code added in the following diff is causing a huge increase in latency to this endpoint.

import opentelemetry from "@opentelemetry/api";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { Resource } from "@opentelemetry/resources";
import {
  BasicTracerProvider,
  SimpleSpanProcessor,
} from "@opentelemetry/sdk-trace-base";
import { SemanticResourceAttributes } from "@opentelemetry/semantic-conventions";

const provider = new BasicTracerProvider({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: "basic-service",
  }),
});

const exporter = new OTLPTraceExporter({});
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
provider.register();

app.use("/hello", async (req, res) => {
+  const tracer = opentelemetry.trace.getTracer("hello-tracer");
+  const span = tracer.startSpan("hello");
+  span.setAttribute("value", "world");
+  span.end();

  res.json({ message: "Hello World" }).end();
});

Alongside our benchmarks, as detailed below, we saw the following as confirmation of our hypothesis:

  1. Performance slowdown using http-instrumentation #4372
  2. Memory leak when using more than one MetricReader #4115
  3. [otlp-transformer] reduce public API surface #4583
  4. [otlp-transformer] stabilization plan #4582
  5. perf(tracer): reduce overhead of attribute sanitization #4558

Benchmarks

When we ran with our hypothesis, we found a bunch of benchmarks like opentelemetry-java/benchmarks/ and #4171. We found these benchmarks to be not that helpful, and we were unable to compare the two languages or understand how OTEL JS affects performance inside an endpoint.

From this, we created a few types of benchmarks to test our findings and improve our performance.

Isolated OTEL JS Benchmarks

These benchmarks demonstrate using OTEL inside a basic endpoint. Our results show that adding just a few lines of OTEL code to your handler will result in a significant reduction in the performance of your API. For example, a basic http endpoint operating at 6.26ms latency more than triples the average time to 22.03ms when OTEL is added, rendering it unusable for any production setting.

image

GraphQL Debugger Benchmarks

GraphQL Specific fork of isolated benchmarks:

@pichlermarc
Copy link
Member

pichlermarc commented Jun 4, 2024

Hi thanks for the detailed report!

I've had a quick look at the benchmark code, and it looks like the SimpleSpanProcessor is used there. It will export every ended span individually causing the exporter to send an HTTP request for each of these spans. The SimpleSpanProcessor is best used for debugging and local development with the ConsoleSpanExporter as it provides quick feedback.

Consider using the BatchSpanProcessor for production use, this one will actually batch spans together and only pass them on to the SpanExporter in a batched manner, which may improve performance by reducing the amount of export requests made significantly.

@danstarns
Copy link
Contributor Author

Thank you for your response @pichlermarc

I added batch versions of the benchmarks:

Isolated OTEL JS Benchmarks

Screenshot 2024-06-05 at 13 25 00

GraphQL Debugger Benchmarks

Screenshot 2024-06-05 at 13 25 21

Summary

The benchmarks indicate that using the BatchProcessor is significantly more performant than the SimpleProcessor. Thank you again for pointing this out. I have some questions:

Our benchmarks only produce a single span. Given there is nothing to batch, how does changing this processor affect the performance? Does this mean the SimpleProcessor is exporting the spans inline synchronously? Is this even possible?

Are the differences between the two processors documented anywhere?

Could the logic be wrapped in a setTimeout to prevent blocking the immediate loop?

Would it be wise to add a log in the console when users use SimpleProcessor?

@pichlermarc
Copy link
Member

The benchmarks indicate that using the BatchProcessor is significantly more performant than the SimpleProcessor. Thank you again for pointing this out. I have some questions:

Glad to hear that this improved things. 🙂 There's still some performance overhead, but we're hoping to gradually improve this in the future by optimizing the SDK and Exporters.

Our benchmarks only produce a single span. Given there is nothing to batch, how does changing this processor affect the performance?

It batches spans over multiple requests, once the batch is large enough it will then export the spans.

Does this mean the SimpleProcessor is exporting the spans inline synchronously? Is this even possible?

It does some work synchronously, namely serializing the OTLP message and starting the http request it seems to be enough to block the execution.

Are the differences between the two processors documented anywhere?

It is documented on opentelemetry.io, but I think it would also make sense to document this in the JSDoc for the processor.

Could the logic be wrapped in a setTimeout to prevent blocking the immediate loop?

setTimeout or setImmediate would be an option. However, AFAIK it was a conscious design choice that the SimpleSpanProcessor does not defer the execution of the export to a later time.

Would it be wise to add a log in the console when users use SimpleProcessor?

I'd go with a JSDoc addition first, then re-evaluate if we need the log. 🙂

@danstarns
Copy link
Contributor Author

Hi @pichlermarc.

There's still some performance overhead, but we're hoping to gradually improve this in the future by optimizing the SDK and Exporters.

I agree on the overhead. Even with using the Batch processor, in the screenshot, from our benchmarks, it shows that adding a few lines of otel js, using the batch processor, increases latency from 5.99ms to 8.58ms plus it impacts kps too.

Due to these findings, I made a PR to move the benchmarks I am using into this repo:

I have added these benchmarks to the source code of OpenTelemetry as they provide transparent insight into performance plus allow the community to iterate on performance improvements.

setTimeout or setImmediate would be an option. However, AFAIK it was a conscious design choice that the SimpleSpanProcessor does not defer the execution of the export to a later time.

Once we have some benchmarks in place we could experiment with things like this, as introducing a timeout and injecting in the current timestamp should not affect people's applications.

I'd go with a JSDoc addition first, then re-evaluate if we need the log.

Added here:

@JamieDanielson
Copy link
Member

It looks like at this point this became a documentation issue which has been updated (thank you!). We'll close this out and leave the other PR open for the benchmarks as that is still being looked into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

3 participants