windsock: message rate handling #7

rukai · 2023-05-10T23:12:14Z

This is a particularly tricky part of windsocks design but at the same time feels super important, so probably needs discussion before work begins on it.
Or maybe we'll just build a prototype which can be iterated on.

I dont expect to start working on this soon as its so complex and we can still get plenty of value from windsock without it.

The problem

When measuring performance of databases we care about much more than just how many messages can I yeet through this thing. We want to know the performance characteristics of the database at different loads: does latency skyrocket? do responses start to fail?
I think a good starting point would be breaking these down into 4 load categories:

basic load - highest load before negative characteristics observed
medium load - lets define this as average(basic_load, maximum_load) - just gives us an idea of how it performs at a middle ground.
maximum load - highest load before throughput stops increasing
stress load - the maximum amount of messages the bencher can send, the goal is to see if the db falls over under extreme load

These are the points at which its useful to perform benchmarks because they give meaning to the results.
If we just have benchmarks at various hardcoded OPS say ops=4000 and ops=4000000 then just from reading those numbers its not clear what kind of a load they represent.
Additionally those numbers will need to be changed on every machine that they run on in order to hit the load point it is supposed to be testing.

A proposed solution

windsock benches are not responsible for defining these load points.
Instead windsock will drive the bench implementation at different loads to explore the databases performance and locate these points itself.
Maybe that could look like windsock starting at a load of say 10 OPS and then doubling the load every second, if the characteristics match one of the definitions of our 4 load points then note that load down.

We will need a whole bunch of heuristics to:

increase accuracy (maybe binary search within the current and previous load for a more accurate value)
increase speed, jumping by more than double at a time or starting at a larger base value.

Latte's significance measurements could be really useful here.

But how do we go about actually integrating this into the windsock workflow.
If we are not careful we would end up with benches that do not compare properly because they are running at customized loads.
It would also be time consuming to find these values every run so that should be avoided.

A possible approach would be:

benches have a configurable OPS, by default its unlimited but the user can pass in a --operations-per-second 4000 flag to manually set the rate at which all benches run, this allows the user to manually bench at different rates depending on what is useful on their machine while also being able to compare the results sanely across different benches.
The user can generate 3 extra benches (for basic, medium and maximum load points ) by running --set-benches-at-load-points and --clear-benches-at-load-points similar to the UX for windsock: implement --set-baseline and --clear-baseline shotover-proxy#1158
- the user can turn it on and off when it makes sense
- the result is cached across runs which means we avoid rerunning this expensive operation every run AND we maintain rates across subsequent runs making for more comparable benches

The text was updated successfully, but these errors were encountered:

rukai mentioned this issue May 15, 2023

windsock: add --operations-per-second flag shotover/shotover-proxy#1175

Merged

conorbros transferred this issue from shotover/shotover-proxy Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

windsock: message rate handling #7

windsock: message rate handling #7

rukai commented May 10, 2023 •

edited

Loading

windsock: message rate handling #7

windsock: message rate handling #7

Comments

rukai commented May 10, 2023 • edited Loading

The problem

A proposed solution

rukai commented May 10, 2023 •

edited

Loading