You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a particularly tricky part of windsocks design but at the same time feels super important, so probably needs discussion before work begins on it.
Or maybe we'll just build a prototype which can be iterated on.
I dont expect to start working on this soon as its so complex and we can still get plenty of value from windsock without it.
The problem
When measuring performance of databases we care about much more than just how many messages can I yeet through this thing. We want to know the performance characteristics of the database at different loads: does latency skyrocket? do responses start to fail?
I think a good starting point would be breaking these down into 4 load categories:
basic load - highest load before negative characteristics observed
medium load - lets define this as average(basic_load, maximum_load) - just gives us an idea of how it performs at a middle ground.
maximum load - highest load before throughput stops increasing
stress load - the maximum amount of messages the bencher can send, the goal is to see if the db falls over under extreme load
These are the points at which its useful to perform benchmarks because they give meaning to the results.
If we just have benchmarks at various hardcoded OPS say ops=4000 and ops=4000000 then just from reading those numbers its not clear what kind of a load they represent.
Additionally those numbers will need to be changed on every machine that they run on in order to hit the load point it is supposed to be testing.
A proposed solution
windsock benches are not responsible for defining these load points.
Instead windsock will drive the bench implementation at different loads to explore the databases performance and locate these points itself.
Maybe that could look like windsock starting at a load of say 10 OPS and then doubling the load every second, if the characteristics match one of the definitions of our 4 load points then note that load down.
We will need a whole bunch of heuristics to:
increase accuracy (maybe binary search within the current and previous load for a more accurate value)
increase speed, jumping by more than double at a time or starting at a larger base value.
Latte's significance measurements could be really useful here.
But how do we go about actually integrating this into the windsock workflow.
If we are not careful we would end up with benches that do not compare properly because they are running at customized loads.
It would also be time consuming to find these values every run so that should be avoided.
A possible approach would be:
benches have a configurable OPS, by default its unlimited but the user can pass in a --operations-per-second 4000 flag to manually set the rate at which all benches run, this allows the user to manually bench at different rates depending on what is useful on their machine while also being able to compare the results sanely across different benches.
the user can turn it on and off when it makes sense
the result is cached across runs which means we avoid rerunning this expensive operation every run AND we maintain rates across subsequent runs making for more comparable benches
The text was updated successfully, but these errors were encountered:
This is a particularly tricky part of windsocks design but at the same time feels super important, so probably needs discussion before work begins on it.
Or maybe we'll just build a prototype which can be iterated on.
I dont expect to start working on this soon as its so complex and we can still get plenty of value from windsock without it.
The problem
When measuring performance of databases we care about much more than just how many messages can I yeet through this thing. We want to know the performance characteristics of the database at different loads: does latency skyrocket? do responses start to fail?
I think a good starting point would be breaking these down into 4 load categories:
average(basic_load, maximum_load)
- just gives us an idea of how it performs at a middle ground.These are the points at which its useful to perform benchmarks because they give meaning to the results.
If we just have benchmarks at various hardcoded OPS say
ops=4000
andops=4000000
then just from reading those numbers its not clear what kind of a load they represent.Additionally those numbers will need to be changed on every machine that they run on in order to hit the load point it is supposed to be testing.
A proposed solution
windsock benches are not responsible for defining these load points.
Instead windsock will drive the bench implementation at different loads to explore the databases performance and locate these points itself.
Maybe that could look like windsock starting at a load of say 10 OPS and then doubling the load every second, if the characteristics match one of the definitions of our 4 load points then note that load down.
We will need a whole bunch of heuristics to:
Latte's significance measurements could be really useful here.
But how do we go about actually integrating this into the windsock workflow.
If we are not careful we would end up with benches that do not compare properly because they are running at customized loads.
It would also be time consuming to find these values every run so that should be avoided.
A possible approach would be:
--operations-per-second 4000
flag to manually set the rate at which all benches run, this allows the user to manually bench at different rates depending on what is useful on their machine while also being able to compare the results sanely across different benches.--set-benches-at-load-points
and--clear-benches-at-load-points
similar to the UX for windsock: implement--set-baseline
and--clear-baseline
shotover-proxy#1158The text was updated successfully, but these errors were encountered: