Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: new HTTP client package #23707

Open
bradfitz opened this issue Feb 5, 2018 · 37 comments
Open

net/http: new HTTP client package #23707

bradfitz opened this issue Feb 5, 2018 · 37 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Thinking
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Feb 5, 2018

Tracking bug for higher-level more usable HTTP client functionality.

  • take contexts
  • more timeout control per-request, rather than per-transport/client
  • ...Options pattern?
  • decode into JSON/etc easily?
  • treat non-2xx as error (by default?).
  • slurp entire response from server (by default?)
  • control over max response size (some sane default limit when slurping is enabled)

I thought we had a bug for this, but maybe not.

@bradfitz bradfitz added Thinking NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 5, 2018
@bradfitz bradfitz added this to the Go1.11 milestone Feb 5, 2018
@bradfitz bradfitz self-assigned this Feb 5, 2018
@pciet
Copy link
Contributor

pciet commented Feb 5, 2018

This is a pattern in a test client I wrote:

// or Post, PostForm
r, err := client.Get(addr)
if err != nil {
    panic(err)
}
// do some parsing here then discard the rest
_, err = io.Copy(ioutil.Discard, r.Body)
if err != nil {
    panic(err)
}
err = r.Body.Close()
if err != nil {
    panic(err)
}
if r.StatusCode != http.StatusOK {
    panic(r)
}

Are you looking at adding new identifiers?

type BodyParser func(io.Reader) error

// http.ErrBadStatus for non-2xx, discards unread response body, and closes the connection.
func (c *Client) ClosingGet(url string, resp BodyParser) error

// repeated for the other methods
var t MyStruct
err := c.ClosingGet(addr, http.JSONBodyParser(&t))

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 5, 2018

Are you looking at adding new identifiers?

There would be new API for this, yes. I don't have a concrete proposal yet.

@awreece
Copy link
Contributor

awreece commented Mar 26, 2018

We wrote a wrapper for the http client internally that uses the ...Options pattern and achieves a lot of these goals:

https://github.com/AlphaFlow/gohttp

GET with parameters:

cli := http.NewClient()

var ret map[string]interface{}
err := cli.Get(context.Background(),
    "http://example.com",
    http.WithParam("debug", "1),
    http.WithJSONResponse(&ret))

POST with body:

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
defer cancel()

body = map[string]string{
   "hello": "world",
}
err := cli.Post(ctx, "http://example.com",
    http.WithJSONBody(body),
    http.WithHeader("Authorization", "Bearer my-token"))
if bse, ok := err.(*http.BadStatusError); ok && bse.Code == 404 {
    fmt.Println(bse)
}

Mocking the HTTP request:

	return http.NewMockClient(func(ctx context.Context, req *http.Request) error {
		if e != nil {
			return e
		}
		if len(body) == 0 {
			return nil
		}
		if req.Output != nil {
			_, err := req.Output.Write(body)
			return err
		} else if req.JSONOutput != nil {
			return json.Unmarshal(body, req.JSONOutput)
		} else {
			panic("no place to write response")
		}
	})

This has greatly simplified our error handling/json marshaling.

Quirks

  • Mocking the client using this interface is a bit weird -- the mock needs to know an awful lot about how the code works. This could be simplified a bit, maybe.
  • We inherit standard go http client "follow redirect" idiom, when it would be nice for that to be an option.

@bradfitz bradfitz modified the milestones: Go1.11, inpl, Unplanned May 23, 2018
@bradfitz bradfitz changed the title net/http: add high-level HTTP client operations net/http: new HTTP client package Dec 3, 2018
@bradfitz
Copy link
Contributor Author

I've started working on this.

See:

I'll move a lot of that to golang.org/x/exp/http* at some point here.

@danp
Copy link
Contributor

danp commented Dec 18, 2018

Any problems with or changes to tracing to consider? Is https://golang.org/pkg/net/http/httptrace/ still the desired interface for tracing if there's a chance for change?

@bradfitz
Copy link
Contributor Author

@danp, thanks. I'll include a section about that. It's mostly okay as-is AFAIK and should ideally continue to work but if you have issues with it, do share.

@creker
Copy link

creker commented Dec 18, 2018

Maybe that would be a good opportunity to fix one thing that really bugs me. Right now if request fails due to context being cancelled, you get url.Error that wraps context.Cancelled error. It's very annoying when that error is deep inside the call stack wrapped multiple times with something like github.com/pkg/errors. I don't know what is the convention here but it looks to me that context.Cancelled should be returned directly in every call that accepts context without any wrappers.

And in general, it's weird that http package uses url.Error and why the latter is even in the url package. The package description says "Package url parses URLs and implements query escaping" and everything apart from url.Error does relate to URL parsing. Maybe http client should provide it's own error type, even if it's similar to already existing url.Error.

@joonas-fi
Copy link

joonas-fi commented Dec 19, 2018

Just a while ago I made a small facade w/ sane defaults whose varargs design may be of interest: https://github.com/function61/gokit/tree/master/ezhttp

edit: my idea seems to be quite similar to what @awreece already posted

Example:

res := MyResponseType{};
_, err := ezhttp.Get(
    ctx,
    "https://example.com/api/getstuff",
    ezhttp.AuthBearer("myToken"), // this and below are varargs "config mutators"
    ezhttp.RespondsJson(&res),
    ezhttp.TolerateNon2xxResponse)

More examples in test file

Mutators my current ezhttp implements:

  • Header(key, val)
  • Cookie(cookie)
  • AuthBasic()
  • AuthBearer() // this is common enough use case to justify presence in HTTP package IMO
  • SendJson(ref interface{})
  • SendBody(body io.Reader, contentType string) // for raw data
  • RespondsJson(ref interface{}, allowUnknownFields bool)
  • TolerateNon2xxResponse // by default errors if response not 2xx

@lpar
Copy link

lpar commented Dec 19, 2018

I feel like Context as a mechanism for cancelling, and Context as a way to pass data along with a request, probably ought to be different things, not the same object/parameter. I'm not sure whether that's what is meant by noting that Request.WithContext is slow?

@bradfitz
Copy link
Contributor Author

@lpar, redesigning context is out of scope for this bug. In practice, you end up always needing both in big systems (for tracing), and passing two context-like things around everywhere is even grosser than passing one. So one it is. That doesn't affect the performance. WithContext is slow because of the defensive garbage it makes. If you could run a request with a context to begin with (Go 1 didn't accept one), then we wouldn't need WithContext.

@fgrosse
Copy link

fgrosse commented Dec 20, 2018

While I see some examples for using options when sending the request, I often find that in my programs I want to configure those once at startup directly on the client.

Thats why we wrote https://github.com/classmarkets/cmhttp.

Example:

client := cmhttp.Decorate(
    http.DefaultClient,
    cmhttp.Scoped("https://api.example.com"),
    cmhttp.Typed("application/json"), // or cmhttp.JSON()
)

req, err := http.NewRequest("GET", "/v1/places/de.berlin", nil)
if err != nil {
    panic(err)
}

resp, err := client.Do(req)

Like the examples of @awreece and @joonas-fi this uses varargs to let the user pick the options they want but instead of doing this on a per request basis this decorates the client. Just thought I might share that here for more inspiration for your redesign :)

@choonkeat
Copy link
Contributor

Are we open to consider respecting proxy env by default?

@bradfitz
Copy link
Contributor Author

Are we open to consider respecting proxy env by default?

Seems reasonable.

@bcmills
Copy link
Contributor

bcmills commented Dec 20, 2018

The magic-method interfaces (Flusher, Hijacker, Pusher, etc.) are pretty unpleasant: they make the documentation a lot harder to navigate and understand, and make semantically-correct wrappers nearly impossible (you have to write O(2ⁿ) wrapper types for n extension interfaces).

(See also #21904 and #16474.)

Ideally we should not only fold in the existing magic-method interfaces, but also have a plan to avoid the need for them going forward. (Perhaps use structs of functions instead?)

@djui
Copy link

djui commented Dec 20, 2018

I share many points enumerated in the Problems document and would like to drill a bit more into the point Client vs. Transport distinction:

The current separation between Request and Client regularly leads to modifying per-request scope properties (e.g. query parameters, headers, body, ...) (using a builder pattern with late/lazy error handling would be nice) but also modifying client scope properties (e.g. base url for single host API endpoints, user-agent, timeouts, outgoing request and incoming response logging, request id or tracing span tracking, API token and refresh logic, retry logic, ...). All of these client modifications have to be done in a nested way, similar to a server's middleware chaining, using the Transport while being careful to not mix Transport struct and RoundTripper interface.

Is the idea of the Handler to solve these client modifications or maybe even combining request and client modifications?

As a real-world example (Experience Reports?), we use the following pattern for client/Transport middleware:

func NewClient(logger *log.Logger, opts ...TransportOption) *http.Client {
	c := DefaultClient()

	c.Transport = NewTransport(opts...)
	c.Transport = NewLoggingRoundTripper(c.Transport, logger)
	c.Transport = NewSpanIDRoundTripper(c.Transport)

	return c
}

// DefaultClient returns a new http.Client with similar default values to
// http.Client, but with a non-shared Transport, idle connections disabled, and
// keepalives disabled.
func DefaultClient() *http.Client {
	return &http.Client{
		Transport: DefaultTransport(),
	}
}

And the following pattern for Request building:

func NewClient(httpClient *http.Client, opts ...RequestOption) *Client {
	return &Client{
		HTTPClient:  httpClient,
		RequestOpts: opts,
	}
}

func (c *Client) WithOptions(opts ...RequestOption) *Client {
	c.RequestOpts = append(c.RequestOpts, opts...)
	return c
}

func (c *Client) WithContext(ctx context.Context) *Client {
	c.RequestOpts = append(c.RequestOpts, WithContext(ctx))
	return c
}

func (c *Client) WithBaseURL(u string) *Client {
	c.RequestOpts = append(c.RequestOpts, WithBaseURL(u))
	return c
}

func (c *Client) WithHeader(header, value string) *Client {
	c.RequestOpts = append(c.RequestOpts, WithHeader(header, value))
	return c
}

func (c *Client) Do(req *Request, opts ...RequestOption) (*http.Response, error) {
	if c.err != nil { // abort if already in error state
		return nil, c.err
	}

	oo := c.RequestOpts
	oo = append(oo, req.Opts...)
	oo = append(oo, opts...)

	for _, o := range oo {
		if req.err != nil { // abort if already in error state
			return nil, req.err
		}

		o(req)
	}

	if req.err != nil { // abort if already in error state
		return nil, req.err
	}

	return c.HTTPClient.Do(req.Request)
}

@EricFortin
Copy link

We also use a lot the customisation of Transport much like @djui. We also use this to implement caching and client instrumentation for example. I do agree that better control over single request is needed, especially for timeout but I think having a way to configure client(I am not necessarily attached to RoundTripper though) should be kept in the final design.

@bradfitz
Copy link
Contributor Author

@bcmills, I agree, but this is a bug about the HTTP client, not server. I've pushed back on optional interfaces for some time now in all packages, though, and don't plan on using them here.

@szuecs
Copy link

szuecs commented Dec 21, 2018

Dear @bradfitz my wishlist:

  1. current internal net and net/http should expose an API to be able to build your own connection pool and connection handling and put into a new client/server as configuration similar to the current transport for example. Use case: replacing the requests queue with a stack for example
  2. exported error types, instead of return fmt.Errorf(..)
  3. document the net error Temporary() and Timeout(), such that you can be sure how to find out a connection error on connection establishment from one that happened after you wrote HTTP data to the socket. This is quite important for retries in proxy servers.
  4. a working h2c implementation, that can be easily enabled, because often you internally don't need the complexity of a rotation TLS infrastructure which is a beast in itself. I tried to use internal/directory to hack around this, but it's quite complex to do if even possible.

I know some of them might not be httpclient related, but if you can fix or enable some of them while changing httpclient would be really nice.

@earthboundkid
Copy link
Contributor

A feature of Python's requests and some Ruby http clients I have seen is to set up authentication once and specify a base URL to combine with paths (e.g. for different host endpoints for stage vs. prod). I don't think these should be direct features of this library, but it should be designed so that it's easy to wrap the client with something modifies the requests before sending them. With that, a lot of API client libraries out there could be even thinner wrappers than they are now, and just provide a set of JSON structs and a function that takes authentication keys and returns a working httpclient.

@shuLhan
Copy link
Contributor

shuLhan commented Dec 22, 2018

What I don't understand from http client is why each request must have full URL with domain name?

Why not,

cl, err := http.NewClient("http://example.com")

rawsock, err := cl.Connect(addr)
res, err := cl.Get("/api", qparams)
res, err = cl.Post("/api/post?k=v", nil, body)

// And so on...

?

@djui
Copy link

djui commented Dec 22, 2018

@shuLhan Your example seems to hint at a single-host client, which I would consider a specialization, to which optimizations can be done, especially regarding connection pooling. The provided HTTP should be generic and stateless between requests, imho.

@szuecs
Copy link

szuecs commented Dec 22, 2018

@djui true, but it could also be working on the return value of Connect() and hint to some general optimization that could be done in protocol aware connections, for example Keep-Alive in case of http/1.1. Not sure if it makes sense or I miss something else here.

@anacrolix
Copy link
Contributor

Difficulty with magic methods and lack of h2c are big thorns with the existing implementation. Cleaner interfaces and support for server handler middleware would also be good.

@ghost
Copy link

ghost commented Jan 8, 2019

Please provide a way to write to the request body. One example of where this is useful is writing a multipart request body using large files to stored on disk. It's possible to get a writer on the request body using io.Pipe and a goroutine, but it would be nice of the client handled these details or if the client plumbed a writer more directly to the network.

Here's a proposed API for the bradfitz/exp-httpclient:

// BodyFunc specifies the request body using a function.
// The HTTP client calls the function to write the request
// body to an io.Writer. The function may be called 0, 1
// or multiple times.
func (r *Request) BodyFunc(fn func(io.Writer) error) *Request {
	panic("TODO")
	return r
}

@abdullah2993
Copy link

better multipart support out of the box

@shuLhan
Copy link
Contributor

shuLhan commented Jan 8, 2019

@shuLhan Your example seems to hint at a single-host client, which I would consider a specialization, to which optimizations can be done, especially regarding connection pooling.

Please forgive my ignorance, but is not that what a client should do? Unless we are writing a browser, I can't think of any use case where one client can or should handle connection to several hosts (except redirection).

From my point of view, which I am not familiar with the underlying HTTP client code, there are two layer of abstraction that the current HTTP client trying to handle: client layer and connection pooling/management layer; and IMO they should be separated. Case in example: in database, there is an API to create single client connection and there is an API to create connection pooling. Which one user want, its depends on the use case.

The provided HTTP should be generic and stateless between requests, imho.

generic. There are eight HTTP methods since RFC7231 (without PATCH) and I don't think it will be changed in the next ten years. The only thinks that dynamic in client is request query parameter, headers, and body; but, as bradfitz already mentioned, the Request object contains both fields required by client and server which cost allocated struct size when working only on client.

stateless between requests. I am not sure what does that means, since HTTP is build on top of TCP and current HTTP client is synchronous.

@djui
Copy link

djui commented Jan 8, 2019

@shuLhan I guess I was thinking of other use cases such as scrapers or simpler programs/scripts.

@earthboundkid
Copy link
Contributor

Not sure if the httpclient is the right way to address this, but if you just spawn a bunch of goroutines to do a bunch of HTTP requests, you can easily eat through your file descriptor limit and crash your machine. Would a concurrency limit be appropriate in a the new httpclient? If not, how could the client make it easy to add a semaphore wrapper?

@creker
Copy link

creker commented Jan 8, 2019

@carlmjohnson isn't existing http client already gives you all the necessary dials? You can limit keep-alive and concurrent connections to prevent file descriptor exhaustion.

@earthboundkid
Copy link
Contributor

You can set DefaultTransport.MaxConnsPerHost, but AFAIK, there's no way to set a limit if you're scraping multiple hosts. I could be wrong.

@szuecs
Copy link

szuecs commented Jan 10, 2019

@shuLhan Your example seems to hint at a single-host client, which I would consider a specialization, to which optimizations can be done, especially regarding connection pooling.

Please forgive my ignorance, but is not that what a client should do? Unless we are writing a browser, I can't think of any use case where one client can or should handle connection to several hosts (except redirection).

@shuLhan
Think about a reverse proxy case that handles multiple backends.
Of course you don't want to have a client for each backend and if it would make sense, then you would need to fix #23427 or we would have to have for each different http client a goroutine to make DNS requests happen after a while. I am also not sure about the memory footprint of a http client, which could increase of running a reverse proxy in Go.

@szuecs
Copy link

szuecs commented Feb 20, 2019

I just had another idea for the http client usage.
What about a usage like this:

response, err := http.Config({}).Client({}).Request({}).Do({})
cfg := http.Config({Version: http.Version11, Transport: {}, ConnectionPool: {},...})
// or use a default and override
cfg2 := http.DefaultConfig()
cfg2.Transport = ...
cfg2.Client({})

http.Client({ResponseTimeout: 3*time.Second, ...})
cli := http.DefaultClient()
cli.ResponseTimeout = 5*time.Second

Do(
     OnData: func() {},
      ...handlers...
) Response, error 

Maybe handlers to help pipelining large response bodies, that would be efficient in memory handling. Think of a list of Pods and you need only a fraction of data, in your data collection to process in your program. We could easily enable users to use json Encode/Decode, and not read the full Body and use Marshal/Unmarshal.

Error handling would need to be done with returning nil and check nil on itself and do some noop if it's nil:

// http.Config({}).Client({}).Request({}).Do({})
func Config(cfg *Cfg) {} *Cfg {
    if cfg == nil {
        return &DefaultConfig
   }
}

func (cfg *Cfg) Client(cli *Client) *Client {
   if cfg == nil { // this would be done for all other following func receivers to do "error checking"
        return nil
   }
   ... // default cli..
}

...

@szuecs
Copy link

szuecs commented Mar 19, 2019

One more thing, I realized today:
I would like to have client connection pool data accessible for observability. For example:
How many idle connections do we have for a host and in general?

@olekukonko
Copy link

@bradfitz any update or timeline on this?

@reezer
Copy link

reezer commented Dec 18, 2021

treat non-2xx as error (by default?).

Please don't.

Reasons:

  • it's application dependent
  • 3xx also aren't errors (should redirects be treated?)
  • 4xx might still be intended responses and not errors
  • 1xx certainly shouldn't be an error, even though I assume that wasn't intended anyways
  • In many applications you want to have a clear distinction between network errors and status codes
  • It's usually not a problem in other languages/ecosystems, when checking that separately. It's not unusual.
  • Adding more error categories moves complexity to error handling and when imagining a pipeline status code might be something you want to handle later

I think one should avoid making assumptions that are very dependent on applications and I also don't think it would be so much of a boiler plate, if needed. It feels like it sometimes might be an error and might make an if condition easier, but in others be just bothersome. While I can understand that small projects, sample code (which often skips error handling anyways though) it's simple I think in bigger projects where you are likely to wrap code anyways (to change things like user agent, etc.) it's easy enough to add one simple if condition and not check which HTTP status code is an error now.

There's various scenarios where the status code is the main thing you check for, so it's convenient to know right away whether something on the transport side (and related) hasn't failed, vs. having to get the information by querying the error.

Since it's related, esp. to 3xx: Any thoughts on redirects? Should they be handled? If so, how many? Should any data from redirects pages still be made available? Should caching, ETags, CookieJar, etc. be handled per default?

@zamicol
Copy link
Contributor

zamicol commented Oct 13, 2022

[Unrelated thought]: What about typing the status codes? Currently they are just type int.

Something like:

type Status int
const 	StatusContinue     Status      = 100 

Function signatures would be more expressive:

func Redirect(w ResponseWriter, r *Request, url string, code Status)

@earthboundkid
Copy link
Contributor

Brad Fitz's experimental new client package had typed status codes. It was a good idea. But I think this issue is dead now. If it ever comes back to life, probably someone will need to have a multipart strategy to make a new external API first using the existing internals, then rewire the old API to use the new one under the hood. It would be a lot of work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Thinking
Projects
None yet
Development

No branches or pull requests