Drain Response.Body to enable TCP/TLS connection reuse (4x speedup) #317

FiloSottile · 2016-03-27T18:37:06Z

On successful queries (on errors ReadAll runs in CheckResponse), the
Response.Body is not read all the way to the EOF as json.Decoder is used,
which stops at the end of the object, which is probably followed by a \n.

Then, Response.Body.Close() is called. When that happens with a non-drained
Body, the TCP/TLS connection is closed (which makes sense, in case there was
still a lot to read from the network), stopping the Transport from reusing
it. Also, a library user can't drain the Body themselves because Close() is
called before passing the Response to the user.

Since we know GitHub API responses only carry a single JSON object, draining
before closing is free, and saving all those handshakes causes huge savings
in latency, bandwidth and CPU for both the client and the poor GitHub
servers.

Here's the result of running the benchmark below on a ADSL connection:

before: 2m3.001599093s
after: 33.753543928s

func main() {
    ts := oauth2.StaticTokenSource(
        &oauth2.Token{AccessToken: os.Getenv("GITHUB_TOKEN")},
    )
    tc := oauth2.NewClient(oauth2.NoContext, ts)
    gh := github.NewClient(tc)
    start := time.Now()
    for i := 0; i < 100; i++ {
        gh.Repositories.Get("FiloSottile", "gvt")
        fmt.Print(".")
    }
    fmt.Printf("\n%s\n", time.Now().Sub(start))
}

On successful queries (on errors ReadAll runs in CheckResponse), the Response.Body is not read all the way to the EOF as json.Decoder is used, which stops at the end of the object, which is probably followed by a \n. Then, Response.Body.Close() is called. When that happens with a non-drained Body, the TCP/TLS connection is closed (which makes sense, in case there was still a lot to read from the network), stopping the Transport from reusing it. Also, a library user can't drain the Body themselves because Close() is called before passing the Response to the user. Since we know GitHub API responses only carry a single JSON object, draining before closing is free, and saving all those handshakes causes huge savings in latency, bandwidth and CPU for both the client and the poor GitHub servers. Here's the result of running the benchmark below on a ADSL connection: before: 2m3.001599093s after: 33.753543928s func main() { ts := oauth2.StaticTokenSource( &oauth2.Token{AccessToken: os.Getenv("GITHUB_TOKEN")}, ) tc := oauth2.NewClient(oauth2.NoContext, ts) gh := github.NewClient(tc) start := time.Now() for i := 0; i < 100; i++ { gh.Repositories.Get("FiloSottile", "gvt") fmt.Print(".") } fmt.Printf("\n%s\n", time.Now().Sub(start)) }

dmitshur · 2016-03-27T20:52:11Z

Hey Filippo, this is really cool! I have one question, I'll post it inline.

dmitshur · 2016-03-27T20:53:28Z

github/github.go

-	defer resp.Body.Close()
+	defer func() {
+		// Drain and close the body to let the Transport reuse the connection
+		io.Copy(ioutil.Discard, resp.Body)


On successful queries (on errors ReadAll runs in CheckResponse), the
Response.Body is not read all the way to the EOF as json.Decoder is used,
which stops at the end of the object, which is probably followed by a \n.

Since we know GitHub API responses only carry a single JSON object, draining
before closing is free

I'm just curious, but have you tried doing something like n, _ := io.Copy(ioutil.Discard, resp.Body) here and dump the value of n? I'm curious if it's indeed 1 and contains a '\n', or something else.

Good question. Just tried and it turns out it's zero!

Maybe explicitly hitting io.EOF flips a flag somewhere, this looks like something the runtime could improve on.

Cool, thanks for checking!

I think that's probably normal, and not something runtime could do better.

In a general case, readers deal with streams where the EOF may not be known in advance.

The read method is documented to either return nil error first time and 0, EOF the next, but it's also valid to return EOF right away. It's up to the implementation which approach to use.

When Read encounters an error or end-of-file condition after successfully reading n > 0 bytes, it returns the number of bytes read. It may return the (non-nil) error from the same call or return the error (and n == 0) from a subsequent call. An instance of this general case is that a Reader returning a non-zero number of bytes at the end of the input stream may return either err == EOF or err == nil. The next Read should return 0, EOF.

When reading from a network stream or some other unknown source where you can't cheat and look ahead, you may not know it's EOF until you actively try to read more bytes.

Also see this snippet.

interestingly, I just tried this, and it's apparently discarding 0 bytes, yet I'm still seeing the same speed improvements @FiloSottile is reporting. Maybe it's just the io.EOF that is left to be read which is preventing the transport from being reused?

My idea of runtime improvement would be a Read of, say, 512 bytes with a 50ms timeout happening on close, which would probably be good enough for cases like this one. But I'm getting OT, I'll bring this to golang/go :)

psh. stupid GitHub not updating this comment thread so I didn't see your identical comments :)

marcosnils · 2016-03-27T23:09:55Z

Maybe it's a good idea to add a test for this?

FiloSottile · 2016-03-27T23:38:51Z

I don't have time to write one today (1:30am 😲), but I think one can be written like this

make one request
replace Dial in the Transport with panic()
make another request

But it might be flappy, as no behavior is guaranteed here AFAICT.

Can't think of a simpler one.

dmitshur · 2016-03-27T23:43:15Z

I don't think it's necessary to have a test for this, unless someone can come up with a reliable one. This is just normal behavior, and it's explained well in the comment in code. This change is not about correctness (the current code is also correct), it's about an optimization.

willnorris · 2016-03-27T23:54:16Z

Thanks for the patch @FiloSottile!

marcosnils · 2016-03-28T00:40:55Z

make one request
replace Dial in the Transport with panic()
make another request

Why not just mocking with an httptest server any of the github endpoints and use go-github to make the request?. I'm not familiar with the library (I came here through the gophers slack) but I believe it would be fairy simple to do this and assert that the connection is being reused (I've done this in the past with some other projects. )

Thoughts?

bradfitz · 2016-03-29T04:30:04Z

Hi, author of net/http here. I cannot reproduce this result.

The underlying reader (the one beneath the oauth2.Transport's onEOFReader) is returning (83, io.EOF), which is successfully recycling the TCP connection, even without your patch.

Are your measurements actually from that code, or similar-ish code?

Which version of Go?
Which version of oauth2?

dmitshur · 2016-03-29T07:13:46Z

@bradfitz, I tried to reproduce this just now, and I was able to reliably see a significant performance difference. Exact program that I ran:

package main

import (
    "fmt"
    "log"
    "os"
    "time"

    "github.com/google/go-github/github"
    "golang.org/x/oauth2"
)

func main() {
    ts := oauth2.StaticTokenSource(
        &oauth2.Token{AccessToken: os.Getenv("GITHUB_TOKEN")},
    )
    tc := oauth2.NewClient(oauth2.NoContext, ts)
    gh := github.NewClient(tc)
    start := time.Now()
    for i := 0; i < 100; i++ {
        _, _, err := gh.Repositories.Get("FiloSottile", "gvt")
        if err != nil {
            log.Fatalln(err)
        }
        fmt.Print(".")
    }
    fmt.Printf("\n%s\n", time.Now().Sub(start))
}

I'm on go version go1.6 darwin/amd64, OS X 10.11.4 (15E65), and using latest current versions of all dependent repos:

github.com/google/go-github/... - commit 0c9db70
golang.org/x/oauth2/... - commit golang/oauth2@33fa30f
golang.org/x/net/... - commit golang/net@31df19d
github.com/google/go-querystring/... - commit google/go-querystring@6bb77fe

Here are 4 runs with this change and 4 runs without, interleaved.

$ go list -f '{{join .Deps "\n"}}' . | gostatus -stdin -v
     github.com/google/go-querystring/...
     github.com/google/go-github/...
     golang.org/x/net/...
     golang.org/x/oauth2/...
$ go run main.go
....................................................................................................
15.141031633s
$ go run main.go
....................................................................................................
42.322153704s
$ go run main.go
....................................................................................................
14.219332967s
$ go run main.go
....................................................................................................
1m12.91558721s
$ go run main.go
....................................................................................................
15.287252642s
$ go run main.go
....................................................................................................
53.149206247s
$ go run main.go
....................................................................................................
18.676132354s
$ go run main.go
....................................................................................................
52.90852069s

The only difference between the alternate runs are whether io.Copy(ioutil.Discard, resp.Body) was commented out or not. As you can see, it makes a significant difference each time.

The fast runs (14-18 seconds) are with this change applied, and the slow runs (42-73 seconds) were with //io.Copy(ioutil.Discard, resp.Body) commented out.

Appendix

I've also tried adding error checking to the draining and body closing calls, just to see if there were any unexpected values there. There were no panics with either of:

defer func() {
    // Drain and close the body to let the Transport reuse the connection
    n, err := io.Copy(ioutil.Discard, resp.Body)
    if err != nil {
        panic(err)
    }
    if n != 0 {
        panic(fmt.Errorf("unexpected n: %v", n))
    }
    err = resp.Body.Close()
    if err != nil {
        panic(err)
    }
}()

Nor:

defer func() {
    // Drain and close the body to let the Transport reuse the connection
    /*n, err := io.Copy(ioutil.Discard, resp.Body)
    if err != nil {
        panic(err)
    }
    if n != 0 {
        panic(fmt.Errorf("unexpected n: %v", n))
    }*/
    err := resp.Body.Close()
    if err != nil {
        panic(err)
    }
}()

The performance difference remained as before.

dsymonds · 2016-03-29T07:22:50Z

Any chance you could try that experiment using Go tip? Brad was testing with (I believe) a version pretty close to tip. I wonder if there's something that changed in net/http since go1.6.

dmitshur · 2016-03-29T07:27:00Z

Sure, I can try with tip. It'll be a few moments because I'll have to get it and build it.

dsymonds · 2016-03-29T07:27:55Z

Actually, never mind. I think I can reproduce it now at tip, on darwin/amd64 too.

dsymonds · 2016-03-29T07:29:45Z

Here's me running your program in the same configurations you did:

$ go run x.go
....................................................................................................
30.027496143s
$ go run x.go
....................................................................................................
1m51.14892534s

I guess @bradfitz and I will poke into this some more tomorrow now that we have a reproduction case. Thanks @shurcooL!

dmitshur · 2016-03-29T08:06:52Z

FWIW, I had already started, and I was curious, so I tried 1.6 and tip in a Linux VM (Ubuntu 14.04), and I was able to reproduce it there too. Both with 1.6 and current tip (golang/go@45d334e). See screenshot.

FiloSottile · 2016-03-29T10:58:54Z

@shurcooL and @dsymonds answered most questions already, but just to confirm I tested with 1.6 and tip on OS X, with and without oauth2, with and without using DefaultTransport.

JimmyPettersson85 · 2016-03-29T11:49:52Z

Sorry for being slightly OT but why doesn't Close() drain the body itself?

kofalt · 2016-03-29T13:46:24Z

@Slimmy: because closing a connection would then potentially block for a long time - or forever - if the other end of a connection has a lot of / infinite bytes still to be sent. Imagine calling Close only to block on receiving the rest of a 10 GB file you didn't want.

As mentioned in the PR:

Response.Body.Close() is called. When that happens with a non-drained
Body, the TCP/TLS connection is closed (which makes sense, in case there was
still a lot to read from the network)

JimmyPettersson85 · 2016-03-29T13:50:34Z

@kofalt: That makes sense, but it doesn't seem like something that would actually happen a lot in practice. Maybe it should be configurable to make Close() drain the body.

devlo · 2016-03-29T14:14:01Z

@Slimmy

Because of bad actors in internet that would send infinite number of bytes to your site.

JimmyPettersson85 · 2016-03-29T14:18:26Z

@devlo: I get that, but everyone using io.Copy(ioutil.Discard, resp.Body) to enable the reuse of the TCP connection would be susceptible that as well no?

devlo · 2016-03-29T14:22:51Z

@Slimmy

You can use io.LimitedReader in this case to mitigate that.

JimmyPettersson85 · 2016-03-29T14:31:46Z

@devlo

I'm just trying to understand it all in the context of this PR. To achieve maximum performance you have to make sure you read all of the response body (so that the TCP connection can be reused). But in doing so you are risking large (possibly infinite) streams of data. So the solution is to only utilize this "trick" when you are fetching data from trusted sources (i.e GitHub as in the case of this PR)?

kevin-cantwell · 2016-03-29T17:13:25Z

I'm trying to reproduce this issue in its purest form (ie: Without any dependencies beyond the standard packages), but so far am unable to. I want to fully appreciate if this is a problem for anyone using net/http or if it's limited to this project.

Here is my attempt (results in the readme): https://github.com/kevin-cantwell/tcpdraintest

It appears the performance differences are greatest when the number of requests is low. I'm reusing the default http client and my json responses are empty objects with a few extra newline characters to drain. Am I doing it wrong?

kevin-cantwell · 2016-03-29T17:32:27Z

Ok, I think I understand a bit more about this. Seems like this change makes the biggest difference when there is a large number of bytes remaining in the body after json parsing. A few extra bytes doesn't seem to matter much, even at high request volumes. But if you've got hundreds or thousands of extra bytes after json decoding is done, then yes, this could seriously affect performance. My question then is: without committing to draining the entire remaining bytes (which as @kofalt pointed out, could be GB of data), how far should one read before just closing the response? Because unless you read quite far, the performance gains might not be visible.

Another question I might raise is: Is it even reasonable to expect there to be lots of bytes after a terminating json character? There could easily be a newline or two, but hundreds or thousands of them? I'm skeptical that any api would serve that.

Again, if I'm missing the point somehow, please illuminate me! :)

willnorris · 2016-03-29T17:36:48Z

Ok, I think I understand a bit more about this. Seems like this change makes the biggest difference when there is a large number of bytes remaining in the body after json parsing.

In the GitHub cases above, there are actually no remaining bytes after json parsing, so I don't think your statement here is correct. One current working guess is that the io.EOF hasn't been reached, and that's what is causing the connection to not be reused.

dmitshur · 2016-03-29T17:36:58Z

But in doing so you are risking large (possibly infinite) streams of data. So the solution is to only utilize this "trick" when you are fetching data from trusted sources (i.e GitHub as in the case of this PR)?

@Slimmy, have you seen this inline #317 (comment)? A proposed suggestion there is to "read, say, 512 bytes with a 50ms timeout happening on close, which would probably be good enough for cases like this one."

I'm trying to reproduce this issue in its purest form (ie: Without any dependencies beyond the standard packages), but so far am unable to. I want to fully appreciate if this is a problem for anyone using net/http or if it's limited to this project.

@kevin-cantwell, an extremely important aspect of this is the fact that these are TLS connections being made, since the protocol used is https, not http. TLS connection handshake is quite expensive, so reusing TLS connections made to a remote server will offer great savings. Reusing non-TLS connections to localhost will have very little gains, and what's what your test appears to be trying to do from what I can tell.

Flip around the composition order of the http.Response.Body's gzip.Reader vs. the reader which keeps track of waiting to see the end of the HTTP/1 response framing (whether that's a Content-Length or HTTP/1.1 chunking). Previously: user -> http.Response.Body -> bodyEOFSignal -> gzipReader -> gzip.Reader -> bufio.Reader [ -> http/1.1 de-chunking reader ] optional -> http1 framing *body But because bodyEOFSignal was waiting to see an EOF from the underlying gzip.Reader before reusing the connection, and gzip.Reader (or more specifically: the flate.Reader) wasn't returning an early io.EOF with the final chunk, the bodyEOfSignal was never releasing the connection, because the EOF from the http1 framing was read by a party who didn't care about it yet: the helper bufio.Reader created to do byte-at-a-time reading in the flate.Reader. Flip the read composition around to: user -> http.Response.Body -> gzipReader -> gzip.Reader -> bufio.Reader -> bodyEOFSignal [ -> http/1.1 de-chunking reader ] optional -> http1 framing *body Now when gzip.Reader does its byte-at-a-time reading via the bufio.Reader, the bufio.Reader will do its big reads against the bodyEOFSignal reader instead, which will then see the underlying http1 framing EOF, and be able to reuse the connection. Updates google/go-github#317 Updates #14867 And related abandoned fix to flate.Reader: https://golang.org/cl/21290 Change-Id: I3729dfdffe832ad943b84f4734b0f59b0e834749 Reviewed-on: https://go-review.googlesource.com/21291 Reviewed-by: David Symonds <dsymonds@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>

Julio-Guerra · 2018-12-13T13:27:49Z

It's still unclear to me if we should drain res.Body until io.EOF or not. IMO, the current doc implicitly says we should (cf. Body struct field): "The default HTTP client's Transport may not reuse HTTP/1.x "keep-alive" TCP connections if the Body is not read to completion and closed."

pschultz · 2018-12-20T11:47:47Z

However, the docs on http.Client.Do only mention Close, not Read:

If the returned error is nil, the Response will contain a non-nil Body which the user is expected to close. If the Body is not closed, the Client's underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent "keep-alive" request.

refs: https://pkg.go.dev/net/http#Client.Do https://forum.golangbridge.org/t/do-i-need-to-read-the-body-before-close-it/5594 google/go-github#317

karelbilek · 2024-04-08T09:22:15Z

Just to clear things up for others that ended up here - @pschultz the linked go doc have been updated

The response body SHOULD be drained/read to completion! The new docs https://pkg.go.dev/net/http#Client.Do make it more clear.

If the Body is not both read to EOF and closed, the Client's underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent "keep-alive" request.

meblum · 2024-08-27T18:54:42Z

@karelbilek It seems that Close() will drain the body automatically.

dmitshur reviewed Mar 27, 2016
View reviewed changes

willnorris merged commit e0ff711 into google:master Mar 27, 2016

gyuho mentioned this pull request Mar 28, 2016

*: drain http.Response.Body before closing etcd-io/etcd#4880

Merged

olivere mentioned this pull request Mar 29, 2016

JSON serialization/deserialization performance olivere/elastic#254

Closed

paulmey mentioned this pull request Jun 17, 2016

drain request body for connection reuse Azure/go-ntlmssp#6

Merged

Joshua-Anderson mentioned this pull request Jun 21, 2016

fix(errors): return actionable errors from API deis/controller-sdk-go#13

Merged

DmitriyMV mentioned this pull request Jun 23, 2016

Drain Response.Body yanatan16/golang-instagram#3

Closed

dmitshur mentioned this pull request Jun 28, 2016

Optimization opportunity to reuse service struct allocations in github.Client #389

Closed

dropwhile mentioned this pull request Jan 17, 2017

Potential resource leak by not closing the response body cactus/go-camo#14

Merged

jvshahid mentioned this pull request Jun 16, 2017

net/http: connection reuse does not work happily with normal use of json package golang/go#20528

Open

at15 mentioned this pull request Aug 16, 2017

Client, Request, Response at15/go-solr#2

Open

2 tasks

vvatanabe mentioned this pull request May 7, 2018

Remove the code that drains the response body nulab/go-typetalk#9

Closed

drekle mentioned this pull request Aug 23, 2018

Feature/windows support cisco-sso/kdk#35

Closed

casualjim mentioned this pull request Mar 30, 2020

Reuse http client Netflix/spectator-go#47

Merged

dmarkhas mentioned this pull request Jun 28, 2020

Why limit draining response body? hashicorp/go-retryablehttp#82

Closed

itsksaurabh mentioned this pull request Jul 20, 2020

determine if there are situations when io.CopyN is needed to reuse underlying TCP connection #1575

Open

AlexanderYastrebov mentioned this pull request Aug 25, 2020

Drains tokeninfo response body to enable TCP connection reuse zalando/skipper#1511

Merged

bingoohuang mentioned this pull request Aug 28, 2020

压力测试下的golang http client造成的连接数过多 bingoohuang/blog#168

Open

Fiery-Fenix mentioned this pull request Feb 4, 2021

Drain response body on exec call to enable keep-alive connection reuse mailru/go-clickhouse#111

Merged

XuHaoJun mentioned this pull request Aug 19, 2021

fix: should drain resp.Body if not readAll or dump tiaguinho/gosoap#78

Merged

willnorris mentioned this pull request Oct 28, 2021

feat: try HEAD request before GET willnorris/webmention#16

Merged

ZhengzheYang mentioned this pull request Dec 1, 2021

drain before response.Body.Close() istio/tools#1772

Merged

OliverMautschke mentioned this pull request Aug 11, 2022

Improves reusing tcp connections by reading all bytes from the response body #814 cloudfoundry/app-autoscaler-release#819

Merged

benhoyt mentioned this pull request Aug 4, 2023

feat(logfwd): support forwarding logs to Loki canonical/pebble#267

Merged

mitar mentioned this pull request Oct 4, 2023

Make sure response is fully consumed xanzy/go-gitlab#1813

Merged

rajivnathan mentioned this pull request Oct 26, 2023

fix: proxy first request slowness issue codeready-toolchain/registration-service#366

Merged

Drain Response.Body to enable TCP/TLS connection reuse (4x speedup) #317

Drain Response.Body to enable TCP/TLS connection reuse (4x speedup) #317

Conversation

FiloSottile commented Mar 27, 2016

dmitshur commented Mar 27, 2016

dmitshur Mar 27, 2016

Choose a reason for hiding this comment

FiloSottile Mar 27, 2016

Choose a reason for hiding this comment

dmitshur Mar 27, 2016

Choose a reason for hiding this comment

willnorris Mar 27, 2016

Choose a reason for hiding this comment

FiloSottile Mar 27, 2016

Choose a reason for hiding this comment

willnorris Mar 27, 2016

Choose a reason for hiding this comment

marcosnils commented Mar 27, 2016

FiloSottile commented Mar 27, 2016

dmitshur commented Mar 27, 2016

willnorris commented Mar 27, 2016

marcosnils commented Mar 28, 2016

bradfitz commented Mar 29, 2016

dmitshur commented Mar 29, 2016

Appendix

dsymonds commented Mar 29, 2016

dmitshur commented Mar 29, 2016

dsymonds commented Mar 29, 2016

dsymonds commented Mar 29, 2016

dmitshur commented Mar 29, 2016

FiloSottile commented Mar 29, 2016

JimmyPettersson85 commented Mar 29, 2016

kofalt commented Mar 29, 2016

JimmyPettersson85 commented Mar 29, 2016

devlo commented Mar 29, 2016

JimmyPettersson85 commented Mar 29, 2016

devlo commented Mar 29, 2016

JimmyPettersson85 commented Mar 29, 2016

kevin-cantwell commented Mar 29, 2016

kevin-cantwell commented Mar 29, 2016

willnorris commented Mar 29, 2016

dmitshur commented Mar 29, 2016

Julio-Guerra commented Dec 13, 2018

pschultz commented Dec 20, 2018

karelbilek commented Apr 8, 2024 • edited Loading

meblum commented Aug 27, 2024

karelbilek commented Apr 8, 2024 •

edited

Loading