-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc faster than UDS #10
Comments
Wanted to leave a note - I'm currently traveling so it will be bit before I get a chance to look at this. Multiple possibilities including new Go versions, OS changes, etc.... As for the "kernel buffer space over the defaults" , I vaguely remember this being some setting you can make on how much buffer that a Unix socket gets. Its a terrible note since I didn't clarify what the command was to change it. I'd smack my 2020 self in the head if I could. |
Well, apparently I can't sleep so I decided to run the benchmarks from my machine. Now this Mac is newer than the one in the docs that I ran the published benchmarks: 2021 Macbook Pro M1 Max 64GB To be clear, I'm running benchmark: ipc/uds/highlevel/proto/rpc/benchmark In this case, I didn't make any change to the IPC buffer. I think what I had been adjusting before was the IPC buffer size, by default it is set to: kern.ipc.maxsockbuf: 8388608 I can change this with Here's the results: Test Results(uds):[Speed] [10 Users][10000 Requests][1.0 kB Bytes] - min 12.709µs/sec, max 19.393583ms/sec, avg 157.727µs/sec, rps 61861.73 [10 Users][10000 Requests][10 kB Bytes] - min 17.75µs/sec, max 14.697584ms/sec, avg 300.277µs/sec, rps 32853.74 [10 Users][10000 Requests][102 kB Bytes] - min 88.666µs/sec, max 11.183584ms/sec, avg 615.613µs/sec, rps 16175.53 [10 Users][10000 Requests][1.0 MB Bytes] - min 1.343291ms/sec, max 24.624583ms/sec, avg 3.244991ms/sec, rps 3077.23 [Allocs] [10000 Requests][1.0 kB Bytes] - allocs 290,623 [10000 Requests][10 kB Bytes] - allocs 302,992 [10000 Requests][102 kB Bytes] - allocs 321,978 [10000 Requests][1.0 MB Bytes] - allocs 334,421 Test Results(grpc):[Speed] [10 Users][10000 Requests][1.0 kB Bytes] - min 33.458µs/sec, max 3.61625ms/sec, avg 192.079µs/sec, rps 50615.82 [10 Users][10000 Requests][10 kB Bytes] - min 74.167µs/sec, max 5.155416ms/sec, avg 470.981µs/sec, rps 21074.70 [10 Users][10000 Requests][102 kB Bytes] - min 1.24125ms/sec, max 7.500666ms/sec, avg 2.412776ms/sec, rps 4140.90 [10 Users][10000 Requests][1.0 MB Bytes] - min 11.356542ms/sec, max 47.805ms/sec, avg 19.213821ms/sec, rps 520.33 [Allocs] [10000 Requests][1.0 kB Bytes] - allocs 1,524,532 [10000 Requests][10 kB Bytes] - allocs 1,691,127 [10000 Requests][102 kB Bytes] - allocs 1,958,763 [10000 Requests][1.0 MB Bytes] - allocs 3,262,682 I have UDS beating GRPC by 5.9x in speed in the 1MB category. So I'm not sure what the discrepancy is with your run of the test. I adjusted the kern.ipc.maxsockbuf, doubling it and then quadrupling it, but funny enough this only had significant effects on the lower end . It might be that I was referring to much larger buffer sizes that I had tested but removed from the benchmark. I really wish I had been clearer there..... It might be with more information I'll be able to figure out what the difference is, but with what I've got at the moment I'm not sure what ths issue is. |
Thanks for taking time out of your night to take a look I think my current Mac is similar to your prior Mac. I noticed that you are now running OS Monterey. Based on that I decided to upgrade. After the upgrade I see slightly different results: Test Results(uds):[Speed] [12 Users][10000 Requests][1.0 kB Bytes] - min 46.394µs/sec, max 1.356804ms/sec, avg 214.1µs/sec, rps 55233.59 [12 Users][10000 Requests][10 kB Bytes] - min 75.537µs/sec, max 2.037849ms/sec, avg 391.558µs/sec, rps 30499.86 [12 Users][10000 Requests][102 kB Bytes] - min 706.411µs/sec, max 4.307707ms/sec, avg 1.29047ms/sec, rps 9272.03 [12 Users][10000 Requests][1.0 MB Bytes] - min 7.306673ms/sec, max 39.574033ms/sec, avg 14.377907ms/sec, rps 834.06 [Allocs] [10000 Requests][1.0 kB Bytes] - allocs 290,383 [10000 Requests][10 kB Bytes] - allocs 302,228 [10000 Requests][102 kB Bytes] - allocs 321,473 [10000 Requests][1.0 MB Bytes] - allocs 338,836 Test Results(grpc):[Speed] [12 Users][10000 Requests][1.0 kB Bytes] - min 41.153µs/sec, max 1.780725ms/sec, avg 224.803µs/sec, rps 52141.44 [12 Users][10000 Requests][10 kB Bytes] - min 102.856µs/sec, max 3.374822ms/sec, avg 523.397µs/sec, rps 22839.44 [12 Users][10000 Requests][102 kB Bytes] - min 1.187308ms/sec, max 7.811926ms/sec, avg 2.588879ms/sec, rps 4632.78 [12 Users][10000 Requests][1.0 MB Bytes] - min 11.411182ms/sec, max 33.643369ms/sec, avg 21.227748ms/sec, rps 565.16 [Allocs] [10000 Requests][1.0 kB Bytes] - allocs 1,525,499 [10000 Requests][10 kB Bytes] - allocs 1,691,013 [10000 Requests][102 kB Bytes] - allocs 1,957,932 [10000 Requests][1.0 MB Bytes] - allocs 3,270,446 Now UDS is slightly faster than grpc. Increasing kern.ipc.maxsockbuf doesn't seem to make much of a difference. But if I set net.local.stream.recvspace=1280000 and net.local.stream.sendspace=1280000 and rerun I get the following: ========================================================================== [12 Users][10000 Requests][1.0 kB Bytes] - min 37.969µs/sec, max 1.447162ms/sec, avg 203.445µs/sec, rps 58190.32 [12 Users][10000 Requests][10 kB Bytes] - min 61.718µs/sec, max 1.478607ms/sec, avg 392.791µs/sec, rps 30416.75 [12 Users][10000 Requests][102 kB Bytes] - min 667.759µs/sec, max 6.745718ms/sec, avg 1.258853ms/sec, rps 9504.93 [12 Users][10000 Requests][1.0 MB Bytes] - min 7.148514ms/sec, max 40.472526ms/sec, avg 14.250775ms/sec, rps 841.51 [Allocs] [10000 Requests][1.0 kB Bytes] - allocs 290,385 [10000 Requests][10 kB Bytes] - allocs 302,219 [10000 Requests][102 kB Bytes] - allocs 321,025 [10000 Requests][1.0 MB Bytes] - allocs 338,544 Test Results(grpc):[Speed] [12 Users][10000 Requests][1.0 kB Bytes] - min 51.922µs/sec, max 2.425716ms/sec, avg 231.266µs/sec, rps 50616.78 [12 Users][10000 Requests][10 kB Bytes] - min 73.416µs/sec, max 2.124342ms/sec, avg 365.51µs/sec, rps 32077.50 [12 Users][10000 Requests][102 kB Bytes] - min 217.308µs/sec, max 5.464793ms/sec, avg 1.005719ms/sec, rps 11883.91 [12 Users][10000 Requests][1.0 MB Bytes] - min 5.468018ms/sec, max 15.044432ms/sec, avg 8.622932ms/sec, rps 1390.03 [Allocs] [10000 Requests][1.0 kB Bytes] - allocs 1,518,328 [10000 Requests][10 kB Bytes] - allocs 1,701,023 [10000 Requests][102 kB Bytes] - allocs 1,913,110 [10000 Requests][1.0 MB Bytes] - allocs 1,941,978 Now, GRPC is faster again. I don't know if others will get the same boost in GRPC by setting net.local.stream.recvspace and net.local.stream.sendspace. |
Interesting that it does anything. I wonder what they are doing under the
hood because that is only for IP localhost stuff. We are doing domain
sockets. I went back to check my code, because I thought I must have made
some terrible mistake.
But I am telling grpc to use a unix socket, so I have no idea why that
would have any effect.
I made an adjustment as such:
sudo sysctl -w kern.ipc.maxsockbuf=8000000
sudo sysctl -w net.local.stream.sendspace=4000000
sudo sysctl -w net.local.stream.recvspace=4000000
On my machine, UDS wins:
UDS
[10 Users][10000 Requests][1.0 MB Bytes] - min 2.112416ms/sec, max
31.491333ms/sec, avg 3.277516ms/sec, rps 3046.22
GRPC
[10 Users][10000 Requests][1.0 MB Bytes] - min 1.375334ms/sec, max
20.924041ms/sec, avg 5.288225ms/sec, rps 1890.16
But setting those values does give a massive increase for GRPC while making
no change to UDS. The previous value for gRPC was 519.
So we have at least found values that seem to change how GRPC works. It
may be that we have to adjust different values for UDS or that maybe my
code needs to increase the buffer sizes it sends based on looking up what
the current size is.
I'll try to have a look at that.
Cheers Tony,
John
…On Thu, Oct 6, 2022 at 5:56 PM Tony Clarke ***@***.***> wrote:
Thanks for taking time out of your night to take a look
I think my current Mac is similar to your prior Mac. I noticed that you
are now running OS Monterey. Based on that I decided to upgrade. After the
upgrade I see slightly different results:
Test Results(uds):
[Speed]
[12 Users][10000 Requests][1.0 kB Bytes] - min 46.394µs/sec, max
1.356804ms/sec, avg 214.1µs/sec, rps 55233.59
[12 Users][10000 Requests][10 kB Bytes] - min 75.537µs/sec, max
2.037849ms/sec, avg 391.558µs/sec, rps 30499.86
[12 Users][10000 Requests][102 kB Bytes] - min 706.411µs/sec, max
4.307707ms/sec, avg 1.29047ms/sec, rps 9272.03
[12 Users][10000 Requests][1.0 MB Bytes] - min 7.306673ms/sec, max
39.574033ms/sec, avg 14.377907ms/sec, rps 834.06
[Allocs]
[10000 Requests][1.0 kB Bytes] - allocs 290,383
[10000 Requests][10 kB Bytes] - allocs 302,228
[10000 Requests][102 kB Bytes] - allocs 321,473
[10000 Requests][1.0 MB Bytes] - allocs 338,836
Test Results(grpc):
[Speed]
[12 Users][10000 Requests][1.0 kB Bytes] - min 41.153µs/sec, max
1.780725ms/sec, avg 224.803µs/sec, rps 52141.44
[12 Users][10000 Requests][10 kB Bytes] - min 102.856µs/sec, max
3.374822ms/sec, avg 523.397µs/sec, rps 22839.44
[12 Users][10000 Requests][102 kB Bytes] - min 1.187308ms/sec, max
7.811926ms/sec, avg 2.588879ms/sec, rps 4632.78
[12 Users][10000 Requests][1.0 MB Bytes] - min 11.411182ms/sec, max
33.643369ms/sec, avg 21.227748ms/sec, rps 565.16
[Allocs]
[10000 Requests][1.0 kB Bytes] - allocs 1,525,499
[10000 Requests][10 kB Bytes] - allocs 1,691,013
[10000 Requests][102 kB Bytes] - allocs 1,957,932
[10000 Requests][1.0 MB Bytes] - allocs 3,270,446
Now UDS is slightly faster than grpc. Increasing kern.ipc.maxsockbuf
doesn't seem to make much of a difference. But if I set
net.local.stream.recvspace=1280000 and net.local.stream.sendspace=1280000
and rerun I get the following:
==========================================================================
[Speed]
[12 Users][10000 Requests][1.0 kB Bytes] - min 37.969µs/sec, max
1.447162ms/sec, avg 203.445µs/sec, rps 58190.32
[12 Users][10000 Requests][10 kB Bytes] - min 61.718µs/sec, max
1.478607ms/sec, avg 392.791µs/sec, rps 30416.75
[12 Users][10000 Requests][102 kB Bytes] - min 667.759µs/sec, max
6.745718ms/sec, avg 1.258853ms/sec, rps 9504.93
[12 Users][10000 Requests][1.0 MB Bytes] - min 7.148514ms/sec, max
40.472526ms/sec, avg 14.250775ms/sec, rps 841.51
[Allocs]
[10000 Requests][1.0 kB Bytes] - allocs 290,385
[10000 Requests][10 kB Bytes] - allocs 302,219
[10000 Requests][102 kB Bytes] - allocs 321,025
[10000 Requests][1.0 MB Bytes] - allocs 338,544
Test Results(grpc):
[Speed]
[12 Users][10000 Requests][1.0 kB Bytes] - min 51.922µs/sec, max
2.425716ms/sec, avg 231.266µs/sec, rps 50616.78
[12 Users][10000 Requests][10 kB Bytes] - min 73.416µs/sec, max
2.124342ms/sec, avg 365.51µs/sec, rps 32077.50
[12 Users][10000 Requests][102 kB Bytes] - min 217.308µs/sec, max
5.464793ms/sec, avg 1.005719ms/sec, rps 11883.91
[12 Users][10000 Requests][1.0 MB Bytes] - min 5.468018ms/sec, max
15.044432ms/sec, avg 8.622932ms/sec, rps 1390.03
[Allocs]
[10000 Requests][1.0 kB Bytes] - allocs 1,518,328
[10000 Requests][10 kB Bytes] - allocs 1,701,023
[10000 Requests][102 kB Bytes] - allocs 1,913,110
[10000 Requests][1.0 MB Bytes] - allocs 1,941,978
Now, GRPC is faster again. I don't know if others will get the same boost
in GRPC by setting net.local.stream.recvspace and
net.local.stream.sendspace.
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJQZLU3OCNNUYKXGHQELVDWB5YM5ANCNFSM6AAAAAAQ47HGZM>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
John Doak
www.obscuredworld.com
|
When I run the benchmark test on my MacBook I observe that for 1MB payload GRPC is almost twice as fast as UDS. It was mentioned that "To get better performance in large sizes, I had to add some kernel buffer space over the defaults, which lead to close to double performance". Anyone know how to add "some kernel buffer space over the defaults"?
Results below:
2022/10/04 17:37:17 benchmark.go:119: Running tests for: uds
2022/10/04 17:37:17 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024, Alloc:false}
2022/10/04 17:37:18 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:10240, Alloc:false}
2022/10/04 17:37:20 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:102400, Alloc:false}
2022/10/04 17:37:22 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024000, Alloc:false}
2022/10/04 17:37:38 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:1024, Alloc:true}
2022/10/04 17:37:42 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:10240, Alloc:true}
2022/10/04 17:37:46 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:102400, Alloc:true}
2022/10/04 17:37:57 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:1024000, Alloc:true}
Test Results(uds):
[Speed]
[12 Users][10000 Requests][1.0 kB Bytes] - min 44.145µs/sec, max 1.626258ms/sec, avg 233.776µs/sec, rps 50624.37
[12 Users][10000 Requests][10 kB Bytes] - min 91.93µs/sec, max 1.305938ms/sec, avg 396.359µs/sec, rps 30119.14
[12 Users][10000 Requests][102 kB Bytes] - min 751.183µs/sec, max 7.131249ms/sec, avg 1.284381ms/sec, rps 9317.00
[12 Users][10000 Requests][1.0 MB Bytes] - min 8.975998ms/sec, max 183.318841ms/sec, avg 17.873009ms/sec, rps 670.42
[Allocs]
[10000 Requests][1.0 kB Bytes] - allocs 290,385
[10000 Requests][10 kB Bytes] - allocs 302,219
[10000 Requests][102 kB Bytes] - allocs 321,047
[10000 Requests][1.0 MB Bytes] - allocs 336,841
2022/10/04 17:38:59 benchmark.go:119: Running tests for: grpc
2022/10/04 17:38:59 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024, Alloc:false}
2022/10/04 17:39:01 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:10240, Alloc:false}
2022/10/04 17:39:02 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:102400, Alloc:false}
2022/10/04 17:39:04 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024000, Alloc:false}
2022/10/04 17:39:13 benchmark.go:122: Test Params: main.testParms{Concurrency:12, Amount:10000, PacketSize:1024, Alloc:true}
2022/10/04 17:39:17 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:10240, Alloc:true}
2022/10/04 17:39:22 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:102400, Alloc:true}
2022/10/04 17:39:30 benchmark.go:122: Test Params: main.testParms{Concurrency:1, Amount:10000, PacketSize:1024000, Alloc:true}
Test Results(grpc):
[Speed]
[12 Users][10000 Requests][1.0 kB Bytes] - min 41.172µs/sec, max 3.198751ms/sec, avg 231.151µs/sec, rps 50555.10
[12 Users][10000 Requests][10 kB Bytes] - min 67.013µs/sec, max 3.159617ms/sec, avg 381.454µs/sec, rps 30822.76
[12 Users][10000 Requests][102 kB Bytes] - min 215.603µs/sec, max 5.728779ms/sec, avg 1.115849ms/sec, rps 10716.17
[12 Users][10000 Requests][1.0 MB Bytes] - min 3.849356ms/sec, max 17.255924ms/sec, avg 9.977688ms/sec, rps 1201.41
[Allocs]
[10000 Requests][1.0 kB Bytes] - allocs 1,518,299
[10000 Requests][10 kB Bytes] - allocs 1,701,021
[10000 Requests][102 kB Bytes] - allocs 1,838,238
[10000 Requests][1.0 MB Bytes] - allocs 2,036,568
The text was updated successfully, but these errors were encountered: