You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Profiling is a bit of a mess right now.
Every profiler collects and stores its results in a slightly different way.
A possible refactor I'm thinking of is this:
Build a profiler binary that is downloaded to each instance.
This binary is then run over ssh, it will collect metrics and/or profile a running service until sigterm is issued at which point it reports all collected metrics+profile in bincode. bincode is used since we want transfer large amounts of binary data efficiently.
This profiler binary would be generic and suitable for use with any project.
We then invoke the binary as needed for the bench and process the results in a shotover specific way on the windsock side.
Depending on which flags the binary is invoked with, it will:
measure all system metrics that we currently get from sar
collect metrics from a prometheus endpoint (as a bonus this will ensure accurate timing of collection interval as network latency is avoided)
perform profiling of a PID via samply (keep in mind this would very rarely be useful for cloud benches as we would need bare metal to run a sampling profiler)
The binary should go in its own repo, be deployed via cargo-dist, and then in shotover repo we just pin a specific url to download from which we bump when we need to use a newer version.
This approach should allow for more accurate readings of prometheus metrics as we can poll exactly as the second ticks over without the variable latency of a network hop.
I dont intend to do this anytime soon as we have other priorities, but if we find ourselves needing to implement yet another way to collect metrics we should probably perform this refactor first.
The text was updated successfully, but these errors were encountered:
Profiling is a bit of a mess right now.
Every profiler collects and stores its results in a slightly different way.
A possible refactor I'm thinking of is this:
Build a
profiler
binary that is downloaded to each instance.This binary is then run over ssh, it will collect metrics and/or profile a running service until sigterm is issued at which point it reports all collected metrics+profile in bincode. bincode is used since we want transfer large amounts of binary data efficiently.
This profiler binary would be generic and suitable for use with any project.
We then invoke the binary as needed for the bench and process the results in a shotover specific way on the windsock side.
Depending on which flags the binary is invoked with, it will:
sar
The binary should go in its own repo, be deployed via cargo-dist, and then in shotover repo we just pin a specific url to download from which we bump when we need to use a newer version.
This approach should allow for more accurate readings of prometheus metrics as we can poll exactly as the second ticks over without the variable latency of a network hop.
I dont intend to do this anytime soon as we have other priorities, but if we find ourselves needing to implement yet another way to collect metrics we should probably perform this refactor first.
The text was updated successfully, but these errors were encountered: