Description
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (go version
)?
go version go1.8.3 linux/amd64
What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/rgeronimi/sabrina"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build401940531=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
What did you do?
I deployed on port 80 a standard http.Server S, plugged into a standard httputil.ReverseProxy P, reverse proxying traffic through a standard http.Transport T to another standard golang web server S2 in a separate unix process. P is configured to use keep-alive.
What did you expect to see?
When used with any browser, S should behave normally, displaying the full website of S2 correctly.
What did you see instead?
It worked with Safari, Chrome and Firefox. But with torBrowser (vanilla MacOS version 7.0, nothing customized), some page resources where randomly broken with a "502 Bad Gateway" message and an empty content (and the content-length header says 0). Headers and torBrowser logs show nothing special beyond the 502 error.
I also tried with all other browsers (outside of torBrowser) but over the Tor network, and everything works. So the issue is triggered by the torBrowser itself and not by the Tor network.
Only the first (and sometimes second) resource is correctly downloaded, and then all the following ones fail with a 502. This is inconsistent, suggesting a race condition.
Suspecting a keep-alive connection management issue somewhere in the chain, I checked the chronology and it matches perfectly a scenario where a connection C is always deemed "broken" by http.Transport after the first resource request Req1 is answered, compromising all the following resources requests Req2, Req3, ... that were pipelined on that same connection C.
The rare cases where not only 1 resource but 2 resources were correctly received by torBrowser is because there were 2 parallel connections to pipeline the requests on.
I checked the httputil.ReverseProxy source code and saw that it sends back 502 errors when the request context is canceled, which matches the torBrowser status code received.
So I made the hypothesis that, for some reasons:
(1) torBrowser systematically has a specific behavior that pushes the http.Server S to declare the request Req1 "canceled" slightly before T considers its own clone TReq1 of Req1 completed
(2) Consequently T propagates that cancellation to all the following requests TReq2, TReq3, ... that it had pipelined on the same connection C (established between T and S2)
(4) Which forces P to generate the 502 errors on each of their original requests Req2, Req3, ...
To test this hypothesis, I implemented this workaround in the httputil.ReverseProxy.Director to remove the influence of contexts on the reverse proxy P and its transport T:
func director(req *http.Request) {
quietReq := req.WithContext(context.Background())
*req = *quietReq
...other code...
}
This workaround worked perfectly : the bug symptoms disappeared, torBrowser shows the website perfectly.
The part missing is the exact explanation for (1). Tests show that the breakages are inconsistent : sometimes all requests go through correctly. So it looks like a subtle race condition between http.Server, httputil.ReverseProxy and http.Roundtrip