-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apmbench: Define benchmark scenarios and topologies #7858
Comments
Building on top of what you already suggested, I'd appreciate having a few more scenarios covered:
The concrete number of events/second needs to be defined after the current limits of the APM Server per size are defined; it should be close to the maximum load it can process.
We might need to tweak these concrete numbers eventually. |
@marclop and @lahsivjar for finishing up the Is there anything else that is required for finishing this task? I expect that these concrete cases need to be incorporated into the automation tooling that the engineering productivity team is currently doing. Please reference here if any more effort on the APM Server team side is required for this. |
For convenience we can maybe split performance and hardware profiles into 2 |
#8275 added a new benchmark |
@simitt After my benchmarking efforts testing the gomaxprocs changes (#8278 (comment)). I think the number of agents we initially proposed in this issue may be too high. Using using more than A good rule of thumb seemed to be incrementing the agent count by ~64 or double the pervious size for each size increment yielded close to optimal results with the default server settings, we could try with other increments as well; 96 or 128. This table could look like:
Perhaps we can separate the objectives that different number of agents have into different jobs or at least analyze them differently since many factors will affect the server's throughput, not only the server size. As we are all aware in the team, the ultimate bottleneck to APM Server performance isn't the APM Server itself, since it can only process events as fast as they are coming our of the APM Server (with some capacity to absorb peaks in its modelindexer buffer), but rather the rate at which Elasticsearch can index the documents that we are sending in our For that reason, expecting linear scalability out of the APM Server wouldn't be reasonable without tuning the |
Quick update on this. We've moved forward with the different topologies defined in #7858 (comment) and have defined them in https://github.com/elastic/apm-server/tree/main/testing/benchmark/system-profiles. Different configurations outside of the topologies such as The remaining automation work is tracked in #7846. |
@marclop your proposal makes sense; we can iterate on and fine-tune the scenarios over time. Please create a follow up issue with relevant config options that are currently out of scope of the tests, such as |
Description
Benchmark scenarios
We currently have 4 benchmarks that leverage the new event handler piece in
apmbench
to load pre-recorded APM Agent events and replay them to a target APM Server. The current benchmarks are split by language agent but that isn't necessarily the best way to benchmark the APM Server. The current generated data has been gathered from the existing opbeans applications, but that is also not necessarily the best type of application to use for our benchmarks.We should discuss what kind of benchmark scenarios we'd like to include to be run on a daily basis and the purpose they serve.
Benchmark topologies
Additionally, we should look into the benchmark size matrix we'd like to support, for example:
Objectives / Outcomes:
The text was updated successfully, but these errors were encountered: