-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTel Cassandra/Elasticsearch Exporter queue defaults #2533
OTel Cassandra/Elasticsearch Exporter queue defaults #2533
Conversation
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #2533 +/- ##
==========================================
+ Coverage 95.32% 95.33% +0.01%
==========================================
Files 208 208
Lines 9248 9248
==========================================
+ Hits 8816 8817 +1
+ Misses 355 354 -1
Partials 77 77
Continue to review full report at Codecov.
|
Worth noting that the size 50 is pulled completely out of thin air :-). There's probably some queueing theory approach one could use, but we haven't done any testing with that. It's actually a multi-variate optimization since while playing with worker count could be optimized for a single collector w.r.t. its CPU utilization, it's far less clear what impact this would have on the storage. For example, Cassandra connection supports up to 128 (if not more) parallel streams, which could suggest matching number of worker threads, but I am sure the C servers would not be happy with that many (times the number of collectors). Quite curious if someone has an idea what formula could be used here. |
I am unsurprised that 50 isn't rigourously proven to give the best performance per collector :). The main reason I think this PR is important is that it makes the otel collector/ingester layers perform roughly 1:1 with the current collector/ingester. Before they were doing half the work or less which may be a surprise to someone trying to swap to the new otel collector based pipeline. |
@joe-elliott Seems reasonable having the same defaults as currently used in Jaeger. Just to confirm, these defaults can still be overridden using the CLI flags? |
@objectiser These defaults can be overridden using the OTEL config, but not the Jaeger CLI. Currently the Jaeger CLI options |
Which problem is this PR solving?
While working on #2494 I discovered that the default otel config was pushing far less spans/second to my Scylla backend. After some research it was discovered that it's b/c the default otel collector config only has 10 workers while the default jaeger collector config has 50.
Short description of the changes
This PR uses the default collector queue settings for both the cassandra and elasticsearch exporters' queue settings. These are the two "production" exporters supported
This does bring up another concern that might make sense to discuss as part of this issue. By default the ingester pulls from the Kafka queue and immediately pushes its spans into the configured backend, but a no-queue configuration of the otel collector does not seem possible at the moment. The queue can technically be disabled but performance is abysmal. This requires more research.
There are certainly other ways to approach "fixing" this. Including leaving it alone and making the user change the configuration.