-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance benchmarking for general SDK overhead #3940
Comments
Adding our experience with moving our stack from logging to OpenTelemetry, we saw a large increase in the number of pods required to service requests and a significant rise in CPU usage. Using profiling and flame charts, we tracked down this increase to the number of times that the garbage collector was being called when we had instrumented with OpenTelemetry. Our assumption is that due to the additional objects being created for spans and attributes, we had saturated the short-lived object memory pools in NodeJS, resulting in objects being copied between the memory spaces frequently. We managed to bring down the increase in CPU by changing the amount of memory allocated to these young pools using the --max-semi-space-size CLI option. For the majority of our components an increase from 16MiB -> 64MiB max-semi-space-size resolved the issues with increased GC'ing. Some of our more highly utilised components required us to go further and increase this to 128MiB along with tuning some of the batch span processor options, to increase the max export batch size (OTEL_BSP_MAX_EXPORT_BATCH_SIZE=1024) and the max queue size (OTEL_BSP_MAX_QUEUE_SIZE=4096). We also had to increase the pods CPU requests by 20% to prevent overscaling. Advice on tuning performance, I believe, would be extremely useful to add to the OpenTelemetry Javascript documentation. Finally, we had initially chosen to use the GRPC exporter, expecting this to provide the best performance however we are now re-evaluating this decision and testing HTTP/JSON exporter rather than making assumptions! |
@martinkuba I've finally found some time to test the difference that http/json exporter might have over grpc. In our test system I change the single highest scaling component to use http/grpc and the result were - no difference. |
Why doesn’t this library offload the transformation and log shipping to another thread, like Pino does? |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
This issue was closed because it has been stale for 14 days with no activity. |
This is not stale. |
While every application is different and there are many factors to consider when measuring performance, it would be useful to give users some idea of the performance characteristics of the SDK. This issue is intended as a discussion of what type of benchmark tests would be useful, how often to run them etc.
This spec describes performance benchmark testing to measure the overhead of OTel SDKs. Specifically, it describes measuring
The second one is of particular importance because it translates to scaling and computing costs of running a service in the cloud.
I am planning to do some testing based on the spec and provide the numbers here. Other outcomes that I think might be useful
Looking back at the history of the JS SDK, I see that there used to be a basic benchmarking tool. I am curious why it was removed
#390
The text was updated successfully, but these errors were encountered: