-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid allocations in HTTP/1.1 connection pool #102108
Conversation
Tagging subscribers to this area: @dotnet/ncl |
.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs
Show resolved
Hide resolved
{ | ||
/// <summary> | ||
/// A <see cref="ConcurrentStack{T}"/> with an extra head pointer to opportunistically avoid allocations on pushes. | ||
/// In situations where Push/Pop operations frequently occur in pairs (common under steady load), this avoids most allocations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we be better off then having one (or more such) dedicated fields then falling back to a locked stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's way better than just a locked stack, but it is a bit worse than the current approach: about -2 % on the HttpClient benchmark with a lock
, -1.5% with a SpinLock
and extra padding to split the lock and the head.
And for the 5-core stress I mentioned in the original description:
main: 3046k
pr: 3569k
lockedStack: 3123k
spinlockedStack: 3083k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pr: 3569k
lockedStack: 3123k
That suggests we're still frequently hitting the fallback case. How deep does the stack get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the rate of going to the fallback is (roughly):
- stress: 30%
- httpclient: 45%
- yarp: 70%
With two values before the fallback instead of 1 it's:
- stress: 14%
- httpclient: 25%
- yarp: 60%
Yarp numbers aren't great here, there is a much bigger gap between returning a connection and renting it back, so there's often more than 1 connection on the pool.
This might not be worth the extra logic
#99364 switched the H1 connection pool to using a
ConcurrentStack
as the store of connections.While that change significantly improved throughout (particularly under contention), it does mean we're allocating an extra 32 bytes every time we push to the connection stack (lack of DCAS).
This PR tweaks the backing store to use an extra
head
pointer we hit first before falling back to theConcurrentStack
. Because we're never storingnull
connections, we can usenull
as a sentinel for unset, and skip the allocations on the happy path.That makes alternating push/pops (which is quite common) alloc-free.
Throughput-wise it's within noise for both YARP and HttpClient crank scenarios.
Under very high contention (5 cores doing the in-memory connection pool stress (
SendAsync
above)), results vary greatly between runs. But averaged out the updated variant does come out on top.Average over 50 runs each:
main=3046k
pr=3569k
. (blue == main)My guess is that this is happening because we're splitting the contention over two memory locations (some push/pops would be served by the new head without going into spin loops on the stack).
While this isn't that much code, we'd likely delete it whenever #31911 gets implemented and we start using it in
ConcurrentStack
.