-
-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Library does not scale with multiple cores #531
Comments
A couple of things to try:
|
A bigger effort would be to replace the single massive lock to per-stream locks. There might still need to be a large lock, but the goal would be to not need to keep it locked for long or for most operations. It's not exactly the same here, but grpc-go did a similar change a couple years ago to reduce contention: grpc/grpc-go#1962 |
Some initial measurements from replacing the lock in `streams.rs with one from parking lot: https://gist.github.com/bIgBV/4d6d76773a948734ebef1367ef5221d5 |
@bIgBV It seems that the comparison results of parking_lot and the original implementation are similar? |
The libstd Mutex was recently replaced with a new implementation that is both much smaller and significantly faster. There is much less to lose now with per-stream locking. |
@jeffutter thanks for the excellent write-up! A way forward would be to do what I suggested, make per-stream locks so we only need to lock the stream store in-frequently: when adding or removing a stream. |
@seanmonstar Yeah. I think that would help my specific use case greatly, since I create all of the streams up-front and re-use them for many requests. So the global locks wouldn't occur mid-work. I might try to take a stab at making that change in my free time. Although, it'll probably take me a while to get up-to speed on h2 internals. |
@seanmonstar I’ve been reading through the My understanding is that ultimately only one Frame can be written to the underlying IO at one time. So there needs to be a single buffer of Frames to send or I suppose a set of buffers and some mechanism to choose which one to take a frame from next. Currently all of the Frames get put in the SendBuffer on the Let me know if that’s making any sense 🙃 or if you have any other suggestions as to how you’d go about implementing this. Also, if you have any general resources for understanding HTTP/2 streams and flow control beyond the spec I’d love to read up more there too. Thanks again for any help here. Hopefully with a bit of guidance I can help find a solution. |
This PR adds a simple benchmark to measure perf improvement changes. E.g., a potential fix for this issue: #531 The benchmark is simple: have a client send `100_000` requests to a server and wait for a response. Output: ``` cargo bench H2 running in current-thread runtime at 127.0.0.1:5928: Overall: 353ms. Fastest: 91ms Slowest: 315ms Avg : 249ms H2 running in multi-thread runtime at 127.0.0.1:5929: Overall: 533ms. Fastest: 88ms Slowest: 511ms Avg : 456ms ```
@seanmonstar FYI i'm working on this now |
As demonstrated by the benchmarks in this reddit post, you can see the
rust_tonic_mt
benchmark falling behind in performance as the number of threads are increased.The likely cause for this could be that a big portion of the shared state is behind this
Mutex
.The text was updated successfully, but these errors were encountered: