-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
take std.http in a different direction #18955
Conversation
Regarding the redirect code in Lines 535 to 537 in 2176a73
request.uri to determine where the final location of the repository is and provide a better error message to the user. All the other HTTP requests (after capabilities are discovered) in git.zig pass .handle_redirects = false because at that point it isn't necessary.
For example,
Edit: (since I also confused myself when looking at this again) the |
I've personally used tuples for headers pub const HeaderKv = struct { []const u8, []const u8 };↴
pub const Headers = []const HeaderKv;↴ Then for the dynamic header building scenario I have .. HeadersBuilder 😅 |
Re-enabling Zstandard compression for the server (with a non-allocating Zstandar API) is simple if we can either:
Would either of these be a suitable solution? |
Thanks @ianprime0509 - for some reason I thought this was the request that was passing "unhandled" - i.e. giving the redirect directly to the git code rather than handling up to 3 redirects. Now it all makes sense to me. Thanks for the explanation! |
Adding error.OutOfMemory to the error set is no good. Think about it, you're waiting for the client to send the http headers, and you already have a big enough buffer to store all of them. It shouldn't be possible to run out of memory.
8 MiB is a steep price to pay per request. It starts to limit how many simultaneous clients are supported. For this reason I think it makes sense to default that buffer to be Anyway I definitely would like to see the Allocator requirement entirely removed from zstd and instead accept a user-supplied buffer. Then it can be added back to std.http. Edit: hmm I have another question about zstd. Does the size of the payload affect the needed window buffer size? For example, if you set the maximum http headers buffer to 16K, that means that the uncompressed data is maximum 16K, and therefore the needed Window size would also be <= 16K. Is that how it works? Because if so, then users of |
Yes, this is how it works.
I've already got this done in a branch, only haven't PR'd because I'm unsure how to integrate the changes with |
If you like you can open the PR against this branch. Note that I just pushed a relevant commit moments ago, and that deinit() is no longer being called. I would expect this function to be entirely deleted: pub fn deinit(self: *Self) void {
if (self.state == .NewFrame) return;
self.allocator.free(self.decode_state.literal_fse_buffer);
self.allocator.free(self.decode_state.match_fse_buffer);
self.allocator.free(self.decode_state.offset_fse_buffer);
self.allocator.free(self.literals_buffer);
self.allocator.free(self.sequence_buffer);
self.buffer.deinit(self.allocator);
} |
|
If a server is expecting 100% GET requests, it doesn't need to have any buffer for zstd because GET requests are not allowed to have a payload, so nothing can be compressed. |
Oops, yes my mistake. 2 mistakes actually. After accept(), the URL is available, which is when "routing" happens, passing control flow to a request handler. That request handler likely knows whether it will need to possibly accept uploads, and whether there is an expected upper bound on the payload size of those uploads, so that's the right place to allocate and pass a zstd buffer size, or not. That means that accept() is too early for being where to pass a possible zstd compression buffer. However, wait() is also not the best place, because the handler could have a chance to notice that some form of compression was requested, and then choose where to get the buffer from, if any. So I think the To @dweiller - don't stress too much about how to integrate into std.http, I'm going to rework send, wait, init, deinit etc. Edit: above text incorrectly implies that URL is available after |
4417ceb
to
8acb62d
Compare
8acb62d
to
a1509ae
Compare
Ahh, that's better:
return res.sendAll(.{
.status = status,
.extra_headers = &.{
.{ .name = "content-type", .value = @tagName(file.mime_type) },
},
.content = s.bytes.items[file.contents_start..][0..file.contents_len],
}); |
2c4363c
to
cc9eb22
Compare
cc9eb22
to
8bac94c
Compare
260a08a
to
9eecca2
Compare
Let the user add that if they wish to. It's not strictly necessary, and arguably a harmful default.
It incorrectly had NotWriteable and MessageTooLong in it.
The buffer for HTTP headers is now always provided via a static buffer. As a consequence, OutOfMemory is no longer a member of the read() error set, and the API and implementation of Client and Server are simplified. error.HttpHeadersExceededSizeLimit is renamed to error.HttpHeadersOversize.
companion commit to 919a3ba
Reads the stream until the end, ignoring all the data. Returns the number of bytes discarded.
* Uncouple std.http.ChunkParser from protocol.zig * Fix receiveHead not passing leftover buffer through the header parser. * Fix content-length read streaming This implementation handles the final chunk length correctly rather than "hoping" that the buffer already contains \r\n.
A writer that appends to the list, returning error.OutOfMemory rather than attempting to increase capacity.
Before I mistakenly thought that missing content-length meant zero when it actually means to stream until the connection is closed. Now the respond() function accepts transfer_encoding which can be left as default (use content.len for content-length), set to none which makes it omit the content-length, or chunked, which makes it format the response as a chunked transfer even though the server has the entire contents already buffered. The echo-content tests are moved from test/standalone/http.zig to the standard library where they are actually run.
The API automatically handles these requests as expected. After receiveHead(), the server has a chance to notice the expectation and do something about it. If it does not, then the Server implementation will handle it by sending the continuation header when the read stream is created. Both respond() and respondStreaming() send the continuation header as part of discarding the request body, only if the read stream has not already been created.
avoid a little bit of boilerplate
no content-length header no transfer-encoding header
The HTTP specification does not provide a way to escape \r\n in headers, so it's the API user's responsibility to ensure the header names and values do not contain \r\n. Also header names must not contain ':'. It's an assertion, not an error, because the calling code very likely is using hard-coded values or server-provided values that do not need to be checked, and the error would be unreachable anyway. Untrusted user input must not be put directly into into HTTP headers.
These tests were not being actually run. Now they are run along with testing the standard library.
wasi does not support networking
We didn't know it wasn't passing before because it wasn't actually being run.
9eecca2
to
5b34a1b
Compare
Ultimate flexibility, just be sure to destroy the correct amount of information when looking at them.
I think that's all I'm going to do in this changeset. As you can see there was still a laundry list of changes I wanted to make, including fully reworking the Client API in a similar manner as Server. But it's time for me to move on to other things for now. |
Release notes draft: std.httpThe APIs of std.http.ClientThe var body = std.ArrayList(u8).init(gpa);
const request = try client.fetch(.{
.location = .{ .url = url },
.response_storage = .{ .dynamic = &body },
});
if (request.status != .ok) {
std.log.debug("http bad response code: {s}", .{@tagName(request.status)});
return error.HttpRequestFailed;
}
return try body.toOwnedSlice(); These are the pub const ResponseStorage = union(enum) {
ignore,
/// Only the existing capacity will be used.
static: *std.ArrayListUnmanaged(u8),
dynamic: *std.ArrayList(u8),
}; std.http.HeadersPreviously both In the next section we're going to go over the new server API and we'll show how to use this new header system in context. std.http.ServerThis is how you would previously use a HTTP server: var server = std.http.Server.init(.{});
defer server.deinit();
const address = try std.net.Address.parseIp("127.0.0.1", 8080);
try server.listen(address);
accept: while (true) {
var res = try server.accept(.{
.allocator = gpa,
});
defer res.deinit();
while (res.reset() != .closing) {
res.wait() catch |err| switch (err) {
error.HttpHeadersInvalid => continue :accept,
error.EndOfStream => continue,
else => return err,
};
handleRequest(&res);
}
} The new API de-couples the HTTP server from the underlying TCP server and changes the connection lifecycle. const address = try std.net.Address.parseIp("127.0.0.1", 8080);
var tcp_server = try address.listen(.{
.reuse_address = true,
.reuse_port = true,
});
defer tcp_server.deinit();
accept: while (true) {
var conn = try tcp_server.accept();
defer conn.stream.close();
var read_buffer: [8000]u8 = undefined;
var server = std.http.Server.init(conn, &read_buffer);
while (server.state == .ready) {
// receiveHead allows extra bytes to be read into read_buffer,
// and then will reuse those bytes for the body or the next
// request upon connection reuse
var request = server.receiveHead() catch |err| {
std.debug.print("error: {s}\n", .{@errorName(err)});
continue :accept;
};
handleRequest(&request);
}
} These are the other possible values for pub const State = enum {
/// The connection is available to be used for the first time, or reused.
ready,
/// An error occurred in `receiveHead`.
receiving_head,
/// A Request object has been obtained and from there a Response can be
/// opened.
received_head,
/// The client is uploading something to this Server.
receiving_body,
/// The connection is eligible for another HTTP request, however the client
/// and server did not negotiate connection: keep-alive.
closing,
}; Once returned from pub const Head = struct {
method: http.Method,
target: []const u8,
version: http.Version,
expect: ?[]const u8,
content_type: ?[]const u8,
content_length: ?u64,
transfer_encoding: http.TransferEncoding,
transfer_compression: http.ContentEncoding,
keep_alive: bool,
compression: Compression,
// ... other types and functions
}; To access custom headers you can use RespondingThere are now two main ways of responding to a request. One-shot responsereturn request.respond(content, .{
.status = .ok,
.extra_headers = &.{
.{ .name = "content-type", .value = @tagName(file.mime_type) },
},
}); Streaming responseCalling These are the options that pub const RespondStreamingOptions = struct {
/// An externally managed slice of memory used to batch bytes before
/// sending. `respondStreaming` asserts this is large enough to store
/// the full HTTP response head.
///
/// Must outlive the returned Response.
send_buffer: []u8,
/// If provided, the response will use the content-length header;
/// otherwise it will use transfer-encoding: chunked.
content_length: ?u64 = null,
/// Options that are shared with the `respond` method.
respond_options: RespondOptions = .{},
}; See the stdlib source code for more information about Other impovementsSee #18955 |
This branch started from me working on Groove Basin and then shaving the yak of a static file server and then having to deal with the API of
std.http.Server
which led me to notice inefficiencies and incorrect error sets. This caused me to inspect critically the implementation ofstd.http
at which point I found many things that I wanted to change.First, some pretty straightforward changes (see individual commits for more details):
finish
to not have NotWriteable and MessageTooLong in itstate
enumNext, I removed the ability to heap-allocate the buffer for headers. The buffer for HTTP headers is now always provided via a static buffer. As a consequence, OutOfMemory is no longer a member of the read() error set, and the API and implementation of Client and Server are simplified. error.HttpHeadersExceededSizeLimit is renamed to error.HttpHeadersOversize.
Finally, the big changes:
removal of
std.http.Headers
I originally removed these in 402f967. I allowed them to be added back in #15299 because they were smuggled in alongside a bug fix, however, I wasn't kidding when I said that I wanted to take the design of std.http in a different direction than using this data structure.
Instead, some headers are provided via explicit field names populated while parsing the HTTP request/response, and some are provided via new fields that support passing extra, arbitrary headers. This resulted in simplification of logic in many places, as well as elimination of the possibility of failure in many places. There is less deinitialization code happening now. Furthermore, it made it no longer necessary to clone the headers data structure in order to handle redirects.
http_proxy and https_proxy fields are now pointers since it is common for them to be unpopulated.
loadDefaultProxies
is changed intoinitDefaultProxies
to communicate that it does not actually load anything from disk or from the network. The function now is leaky; the API user must pass an already instantiated arena allocator. Removes the need to deinitialize proxies.Before, proxies stored arbitrary sets of headers. Now they only store the authorization value.
Removed the duplicated code between https_proxy and http_proxy. Finally, parsing failures of the environment variables result in errors being emitted rather than silently ignoring the proxy.
Rework Server Entirely
Mainly, this removes the poorly named
wait
,send
,finish
functions, which all operated on the same "Response" object, which was actually being used as the request.Now, it looks like this:
a Response
passing response trailers.
In other words, the type system now guides the API user down the correct path.
receiveHead allows extra bytes to be read into the read buffer, and then will reuse those bytes for the body or the next request upon connection reuse.
respond(), the one-shot function, will send the entire response in one syscall.
Streaming response bodies no longer wastefully wraps every call to write with a chunk header and trailer; instead it only sends the HTTP chunk wrapper when flushing. This means the user can still control when it happens but it also does not add unnecessary chunks.
Empirically, in my example project that uses this API, the usage code is significantly less noisy, it has less error handling while handling errors more correctly, it's more obvious what is happening, and it is syscall-optimal.
Additionally:
http.Server, and it is used for the HTTP head as well as a buffer
for reading the body into.
and "first".
test/standalone/http.zig was dead code
std.http had only a few unit tests with not very much coverage. The bulk of the testing that was supposedly done was in test/standalone/http.zig. However, I discovered to my disappointment that this is only compiled but not executed. In fact if you execute it in master branch right now, it hangs.
This was not good enough because the CI was not actually running the tests. Also, these should be std lib unit tests instead of one standalone test.
This branch moves those tests over to become proper std lib unit tests, and adds more test coverage.
TODO