Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: the first write error on a connection will cause all subsequent write requests to fail blindly #39337

Closed
jaricftw opened this issue Jun 1, 2020 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jaricftw
Copy link

jaricftw commented Jun 1, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jzhan/.cache/go-build"
GOENV="/home/jzhan/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jzhan/gocode"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/jzhan/bin/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/jzhan/bin/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build713868028=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Here is a test that reproduces the issue: https://github.com/jaricftw/go-http2-stuck-transport/blob/master/conn_write_timeout_test.go#L19

It tests the TCP write deadline behavior with Go HTTP2 library.

Server setup: a blocking TCP server, i.e. connection can be established, but never read data from the connection.
Client setup: An http2 client makes calls to the blocking TCP server. TCP write timeout (using https://golang.org/pkg/net/#TCPConn.SetWriteDeadline) is set for each Write operation of a connection. TCP write buffer is also set to be small to quickly fill up the write buffer of the client.

What did you expect to see?

Each request will get stuck and eventually time out after the specified TCP write timeout (i.e. 2 seconds)

What did you see instead?

First request times out after the specified 2 seconds, as expected

--- FAIL: TestConnWriteTimeout (2.04s)
    conn_write_timeout_test.go:37: got err: Post http://127.0.0.1:44417: write tcp 127.0.0.1:47152->127.0.0.1:44417: i/o timeout

However, any subsequent request returns immediately with the same error:

 conn_write_timeout_test.go:43: got err: Post http://127.0.0.1:44417: write tcp 127.0.0.1:47152->127.0.0.1:44417: i/o timeout
    conn_write_timeout_test.go:45:
                Error Trace:    conn_write_timeout_test.go:45
                Error:          Should be true
                Test:           TestConnWriteTimeout
                Messages:       expect response time to be larger than 2s, actual: 52.696µs

This indicates:

  1. The same connection (source port doesn't change from the above logs across different requests) is reused for requests to the server, even after the first connection write time out.
  2. Even though a new TCP write deadline is set for the connection before each write, the deadline is not respected after the first time out.

Dug into the http2 transport implementation, this is likely because the first write error on the connection will make all future writeHeaders to fail with the same error immediately:

type ClientConn struct {
	// other fields

	werr error      // first write error that has occurred
}

func (cc *ClientConn) writeHeaders(streamID uint32, endStream bool, maxFrameSize int, hdrs []byte) error {
	first := true // first frame written (HEADERS is first, then CONTINUATION)
	for len(hdrs) > 0 && cc.werr == nil {
		// actual work 
	}
	cc.bw.Flush()
	return cc.werr
}

ref: https://github.com/golang/net/blob/master/http2/transport.go#L1245

@dmitshur
Copy link
Contributor

dmitshur commented Jun 2, 2020

/cc @bradfitz @tombergan @rsc

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 2, 2020
@dmitshur dmitshur added this to the Backlog milestone Jun 2, 2020
@networkimprov
Copy link

cc @fraenkel

@fraenkel
Copy link
Contributor

fraenkel commented Jun 7, 2020

There is a missing MarkDead on the connection. The current code only does cleanup if we start the readLoop which never happens.

@fraenkel
Copy link
Contributor

fraenkel commented Jun 7, 2020

Sorry, I did some further digging. It seems the readLoop is entered but never exits which is why the cleanup never occurs as expected.

@fraenkel
Copy link
Contributor

fraenkel commented Jun 7, 2020

@bradfitz It doesn't seem like the http2 tranport handles write errors properly. I see that the error is recorded and it flows up to the client, but I don't see any place that would tear the connection down. Is the assumption that a read failure must occur? It seems a bit difficult if the server side is mishandling requests like this test case does.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/240337 mentions this issue: x/net/http2: Close connection on write errors

roidelapluie added a commit to roidelapluie/common that referenced this issue Aug 3, 2020
HTTP/2 support is golang has many problematic cornercases where dead
connections would be kept.

golang/go#32388
golang/go#39337
golang/go#39750

I suggest we disable HTTP/2 for now and enable it manually on the
blackbox exporter.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@bradfitz
Copy link
Contributor

@gopherbot, please backport. I think this is pretty bad. (It bit us and there's no great workaround, short of disabling HTTP/2)

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #42112 (for 1.14), #42113 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

bradfitz added a commit to tailscale/go that referenced this issue Oct 21, 2020
roidelapluie added a commit to roidelapluie/common that referenced this issue Oct 25, 2020
Given that golang/go#39337 and
golang/go#40201 are fixed:

- We now have a function to configure HTTP/2 transport,
  therefore we have a "ping" every 5 minutes on the line
  if not traffic is doing on
- Dead HTTP/2 connections will be removed from the pool

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/266157 mentions this issue: [release-branch.go1.14] http2: close Transport connection on write errors

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/266158 mentions this issue: [release-branch.go1.15-bundle] http2: close Transport connection on write errors

gopherbot pushed a commit to golang/net that referenced this issue Oct 29, 2020
…rors

When a new connection is created, and a write error occurs during the
initial exchange, the connection must be closed. There is no guarantee
that the caller will close the connection.

When a connection with an existing write error is used or being used, it
will stay in use until its read loop completes. Requests will continue
to use this connection and fail when writing its header. These
connections should be closed to force the cleanup in its readLoop.

Updates golang/go#39337.
For golang/go#42112.

Change-Id: I45e1293309e40629531f4cbb69387864f4f71bc2
Reviewed-on: https://go-review.googlesource.com/c/net/+/240337
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Brad Fitzpatrick <bradfitz@golang.org>
Trust: Damien Neil <dneil@google.com>
(cherry picked from commit f585440)
Reviewed-on: https://go-review.googlesource.com/c/net/+/266157
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Michael Fraenkel <michael.fraenkel@gmail.com>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
gopherbot pushed a commit to golang/net that referenced this issue Oct 29, 2020
…rite errors

When a new connection is created, and a write error occurs during the
initial exchange, the connection must be closed. There is no guarantee
that the caller will close the connection.

When a connection with an existing write error is used or being used, it
will stay in use until its read loop completes. Requests will continue
to use this connection and fail when writing its header. These
connections should be closed to force the cleanup in its readLoop.

Updates golang/go#39337.
For golang/go#42113.

Change-Id: I45e1293309e40629531f4cbb69387864f4f71bc2
Reviewed-on: https://go-review.googlesource.com/c/net/+/240337
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Brad Fitzpatrick <bradfitz@golang.org>
Trust: Damien Neil <dneil@google.com>
(cherry picked from commit f585440)
Reviewed-on: https://go-review.googlesource.com/c/net/+/266158
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/266374 mentions this issue: [release-branch.go1.14] net/http: update bundled x/net/http2

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/266375 mentions this issue: [release-branch.go1.15] net/http: update bundled x/net/http2

gopherbot pushed a commit that referenced this issue Oct 30, 2020
Bring in the change in CL 266157 with:

	go get -d golang.org/x/net@release-branch.go1.14
	go mod tidy
	go mod vendor
	go generate -run=bundle std

Updates #39337.
Fixes #42112.

Change-Id: Iefd0012369c7f0c58201256e29d21210cb9f2f7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/266374
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
gopherbot pushed a commit that referenced this issue Oct 30, 2020
Bring in the change in CL 266158 with:

	go mod edit -replace=golang.org/x/net=golang.org/x/net@release-branch.go1.15-bundle
	GOFLAGS='-mod=mod' go generate -run=bundle std
	go mod edit -dropreplace=golang.org/x/net
	go get -d golang.org/x/net@release-branch.go1.15
	go mod tidy
	go mod vendor

Updates #39337.
Fixes #42113.

Change-Id: I3ebef4b90c11ad271b7a3031aafd80c423c2c241
Reviewed-on: https://go-review.googlesource.com/c/go/+/266375
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
@golang golang locked and limited conversation to collaborators Oct 29, 2021
dteh pushed a commit to dteh/fhttp that referenced this issue Jun 22, 2022
When a new connection is created, and a write error occurs during the
initial exchange, the connection must be closed. There is no guarantee
that the caller will close the connection.

When a connection with an existing write error is used or being used, it
will stay in use until its read loop completes. Requests will continue
to use this connection and fail when writing its header. These
connections should be closed to force the cleanup in its readLoop.

Fixes golang/go#39337

Change-Id: I45e1293309e40629531f4cbb69387864f4f71bc2
Reviewed-on: https://go-review.googlesource.com/c/net/+/240337
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Brad Fitzpatrick <bradfitz@golang.org>
Trust: Damien Neil <dneil@google.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants