-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/http2: pool deadlock #32388
Comments
Change https://golang.org/cl/179938 mentions this issue: |
Whelp....
I haven't reviewed CL 179938 but it seems a bit long/invasive. I wonder why the mutex needs to be held during that write. Rather than change everything so CanTakeRequest can work without taking the lock, can we instead not hold the lock during the write and not modify CanTakeRequest? |
I tried that but because It may be that adding a lock just around the update of |
There seems to be a mix of ways the locks are acquired but it is always mu then wmu. I can see value in acquiring wmu under mu and then releasing if we are trying to guarantee some ordering. |
The test only talks to one endpoint so the http/2 code will try to make these requests over the same TCP connection; it's designed to validate that the first request doesn't block the second request due to a stalled write; in this situation, it should just create a new connection but as the You are correct in that |
Change https://golang.org/cl/181457 mentions this issue: |
Change https://golang.org/cl/224797 mentions this issue: |
Change https://golang.org/cl/240338 mentions this issue: |
@fraenkel Thanks so much for helping fix this bug! We've run into this quite a few times on some service to service calls under specific conditions |
HTTP/2 support is golang has many problematic cornercases where dead connections would be kept. golang/go#32388 golang/go#39337 golang/go#39750 I suggest we disable HTTP/2 for now and enable it manually on the blackbox exporter. Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@fraenkel Thanks for this patch https://golang.org/cl/240338, I've manually applied the patch and the issue is gone. Can we have this merged so I can remove the manual patch? |
Any update on this @fraenkel? This patch: https://golang.org/cl/240338 has been working great for me in production for the last 2 months |
I believe this may be tangential which was resolved in 1.15.3 #42113 |
Given the description of that bug says "will cause all subsequent write requests to fail blindly" I think this is very different as its a definite deadlock. |
@fraenkel Thanks for this patch , will this patch merge into master ? |
Change https://golang.org/cl/349594 mentions this issue: |
Thank you for fixing this issue. It has have been affecting us for quite some time. Is it possible to merge the solution to a 1.17 point release? Or we have to wait for 1.18? |
@gopherbot Please open backport issues. This appears to be an occasional deadlock in http. Should be considered for backporting depending on what @neild thinks. |
Backport issue(s) opened: #48649 (for 1.16), #48650 (for 1.17). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
I believe there is a slight issue when sending a new stream. There is a gap between obtaining the new streamID and sending it via encodeHeaders. If streamID + N is sent before any previous streamID, they will close when received. |
Operations which examine the state of a ClientConn--notably, the connection pool's check to see if a conn is available to take a new request--need to acquire mu. Blocking while holding mu, such as when writing to the network, blocks these operations. Remove blocking operations from the mutex. Perform network writes with only ClientConn.wmu held. Clarify that wmu guards the per-conn HPACK encoder and buffer. Add a new mutex guarding request creation, covering the critical section starting with allocating a new stream ID and continuing until the stream is created. Fix a locking issue where trailers were written from the HPACK buffer with only wmu held, but headers were encoded into the buffer with only mu held. (Now both encoding and writes occur with wmu held.) Fixes golang/go#32388. Fixes golang/go#48340. Change-Id: Ibb313424ed2f32c1aeac4645b76aedf227b597a3 Reviewed-on: https://go-review.googlesource.com/c/net/+/349594 Trust: Damien Neil <dneil@google.com> Run-TryBot: Damien Neil <dneil@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Use the net/http client to make requests to an http2 supporting server.
What did you expect to see?
Requests should continue to succeed if a single connection becomes blocked in a write.
What did you see instead?
A single blocked connection prevented new http2 connections from being established due to the http2 pool requiring the connection lock when testing to see if an existing connection can handle new requests. This results in the entire application hanging indefinitely.
The following trace was created with an older go version 1.9 from our production app, but I have proved it's still possible to reproduce in the latest version as detailed above, as the locking between connections and pool haven't been changed.
This is similar to #23559 however that bug is specifically about certain areas of the http2 connections not paying attention to timeouts whereas this bug is caused by the interaction of individual connection locks and the pool lock.
The text was updated successfully, but these errors were encountered: