-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a benchmark and profile script and hook into CI #1028
add a benchmark and profile script and hook into CI #1028
Conversation
bin/benchmark
Outdated
TERMINATOR = "\r\n" | ||
puts "yjit: #{RubyVM::YJIT.enabled?}" | ||
|
||
client = Dalli::Client.new('localhost', serializer: StringSerializer, compress: false, raw: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard client vs multi-client could be another param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah true, I will be getting this back to usage in CI soon once I port the set_multi feature.
sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true) | ||
# Benchmarks didn't see any performance gains from increasing the SO_RCVBUF buffer size | ||
# sock.setsockopt(Socket::SOL_SOCKET, ::Socket::SO_RCVBUF, 1024 * 1024 * 8) | ||
# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the same buffer size that is in Dalli proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, yeah dalli can also take these adjustments, but you are correct we should default to the same and only apply if folks are passing in options to adjust dalli and then also adjust the socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now just dropping setting this, but as we look at tweaking better defaults we will want to try a few things out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size | |
# Benchmarks did see an improvement in performance when increasing the SO_SNDBUF buffer size |
sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_SNDBUF, 1024 * 1024 * 8) | ||
|
||
# ensure the clients are all connected and working | ||
client.set('key', payload) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a get
after to confirm the key was set?
sock.flush | ||
sock.readline # clear the buffer | ||
|
||
# ensure we have basic data for the benchmarks and get calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get why we are doing this... And why is the payload so much smaller/not configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so these are the defaults for the get_multi... I can make the payload adjustable but we don't typically see get multi with 1mb values so I picked something that was more in the normal range... I will make it configurable and just have it be 1/10th of the full get / set size by default.
I'm good with this conceptually. You all can determine the details and merge when you think that it's ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just getting around to reviewing this now after it's been merged but in general I find the code in bin/benchmark and bin/profile hard to parse and could be DRY'd up.
@@ -0,0 +1,38 @@ | |||
name: Profiles | |||
|
|||
on: [push, pull_request] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these on every push / PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to have on any PR as we can review the profile if we have any concerns that it might impact performance
- name: Run Profiles | ||
run: RUBY_YJIT_ENABLE=1 BENCH_TARGET=all bundle exec bin/profile | ||
- name: Upload profile results | ||
uses: actions/upload-artifact@v4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these get uploaded to? Any instructions on how to pull them down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call added documentation on the action file for how to get them
sock.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true) | ||
# Benchmarks didn't see any performance gains from increasing the SO_RCVBUF buffer size | ||
# sock.setsockopt(Socket::SOL_SOCKET, ::Socket::SO_RCVBUF, 1024 * 1024 * 8) | ||
# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Benchamrks did see an improvement in performance when increasing the SO_SNDBUF buffer size | |
# Benchmarks did see an improvement in performance when increasing the SO_SNDBUF buffer size |
count -= 1 | ||
tail = count.zero? ? '' : 'q' | ||
sock.write(String.new("ms #{key} #{value_bytesize} c F0 T#{ttl} MS #{tail}\r\n", | ||
capacity: key.size + value_bytesize + 40) << value << TERMINATOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's 40?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just needed to cover all the characters like ms, ' ', c, F0, etc.... I just picked a number as I was modifying this command a few times, a few extra unused bytes in the buffer didn't matter.
pairs.each do |key, value| | ||
client.set(key, value, 3600, raw: true) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a corresponding get
call to make sure the keys were set successfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the benchmark verifies during the get bench that they are their or it will raise an exception. So I don't think we need to check before the script, it would fail faster this way but also duplicate more code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so handled with raise 'mismatch' unless result == payload
I will open a PR to fix up some more of this, but it is a bit hard to fully DRY up as I didn't want to create classes or other files just to leverage in the benchmark and profile, so there is definitely duplication between the two. Like tests, I often find fully drying up profiles and benchmarks makes it harder to focus in and change only the piece you care about digging into, so I don't want to get to fancy with abstractions that might leak and impact the code we are trying to examine. |
@nickamorim since this was closed and my other PR already was updating profile/benchmark to support meta protocol I addressed your feedback as part of that PR #1029 |
This will make it easier for us to start upstreaming some of the performance and latency related PRs we have been testing out. We should be able to compare profiles before and after applying change and post any benchmark differences as well...
current benchmark:
while a profile can be view with https://vernier.prof/