Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windsock: message rate handling #7

Open
rukai opened this issue May 10, 2023 · 0 comments
Open

windsock: message rate handling #7

rukai opened this issue May 10, 2023 · 0 comments

Comments

@rukai
Copy link
Member

rukai commented May 10, 2023

This is a particularly tricky part of windsocks design but at the same time feels super important, so probably needs discussion before work begins on it.
Or maybe we'll just build a prototype which can be iterated on.

I dont expect to start working on this soon as its so complex and we can still get plenty of value from windsock without it.

The problem

When measuring performance of databases we care about much more than just how many messages can I yeet through this thing. We want to know the performance characteristics of the database at different loads: does latency skyrocket? do responses start to fail?
I think a good starting point would be breaking these down into 4 load categories:

  • basic load - highest load before negative characteristics observed
  • medium load - lets define this as average(basic_load, maximum_load) - just gives us an idea of how it performs at a middle ground.
  • maximum load - highest load before throughput stops increasing
  • stress load - the maximum amount of messages the bencher can send, the goal is to see if the db falls over under extreme load

These are the points at which its useful to perform benchmarks because they give meaning to the results.
If we just have benchmarks at various hardcoded OPS say ops=4000 and ops=4000000 then just from reading those numbers its not clear what kind of a load they represent.
Additionally those numbers will need to be changed on every machine that they run on in order to hit the load point it is supposed to be testing.

A proposed solution

windsock benches are not responsible for defining these load points.
Instead windsock will drive the bench implementation at different loads to explore the databases performance and locate these points itself.
Maybe that could look like windsock starting at a load of say 10 OPS and then doubling the load every second, if the characteristics match one of the definitions of our 4 load points then note that load down.

We will need a whole bunch of heuristics to:

  • increase accuracy (maybe binary search within the current and previous load for a more accurate value)
  • increase speed, jumping by more than double at a time or starting at a larger base value.

Latte's significance measurements could be really useful here.

But how do we go about actually integrating this into the windsock workflow.
If we are not careful we would end up with benches that do not compare properly because they are running at customized loads.
It would also be time consuming to find these values every run so that should be avoided.

A possible approach would be:

  • benches have a configurable OPS, by default its unlimited but the user can pass in a --operations-per-second 4000 flag to manually set the rate at which all benches run, this allows the user to manually bench at different rates depending on what is useful on their machine while also being able to compare the results sanely across different benches.
  • The user can generate 3 extra benches (for basic, medium and maximum load points ) by running --set-benches-at-load-points and --clear-benches-at-load-points similar to the UX for windsock: implement --set-baseline and --clear-baseline shotover-proxy#1158
    • the user can turn it on and off when it makes sense
    • the result is cached across runs which means we avoid rerunning this expensive operation every run AND we maintain rates across subsequent runs making for more comparable benches
@conorbros conorbros transferred this issue from shotover/shotover-proxy Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant